Skip to content

Infrastructure

🔍 Observability

Distributed Tracing

Every A2A call creates an OpenTelemetry span. W3C Trace Context headers are propagated via A2A message metadata, allowing Jaeger to stitch traces across all ten agents into a single view.

Span naming convention:

Span Source
hephaestus.route Pipeline determination
hephaestus.pipeline.step.{agent} Each specialist call
hephaestus.pipeline.fix_loop Kallos-Techne iterations
a2a.connect.{agent} Agent card fetch
a2a.send.{agent} Message send
{agent}.execute Agent-specific execution
{agent}.generate LLM call

What You See in Jaeger

Open localhost:16686 and select any service:

  • Full request flow — One trace spanning all agents in the pipeline
  • Per-agent latency — How long each specialist took (LLM call time dominates)
  • Error locations — Which agent failed and at which operation
  • Fix loop iterations — How many Kallos-Techne rounds were needed

📊 Service Performance Monitoring (SPM)

Jaeger generates RED metrics (Rate, Error, Duration) from traces and stores them in Prometheus. This enables the "Monitor" tab in the Jaeger UI for high-level service health visualization.

  • RED Metrics — Instant visibility into request volume, error percentages, and latency percentiles (P50, P95, P99).
  • Metric Exploration — Use the Prometheus UI at localhost:9090 for raw PromQL queries and custom dashboarding.
  • Span-to-Metrics — Jaeger's internal collector generates these metrics in real-time as traces arrive via OTLP.

🐳 Infrastructure

Docker

A single generic Dockerfile at docker/agent.Dockerfile builds any agent via the AGENT_NAME build arg:

Build a single agent
docker build --build-arg AGENT_NAME=mneme -f docker/agent.Dockerfile -t kourai-mneme .

Multi-stage build: builder installs deps with uv, runtime copies only the venv. Each container has a health check against /.well-known/agent-card.json.

Docker Compose

docker-compose.yml defines all ten agents + infrastructure. docker compose up brings everything up — agents resolve each other via Docker service names (e.g., http://hephaestus:10000).


🔑 Key Design Decisions

Why a2a-sdk directly, not AgentStack?

AgentStack requires Kubernetes via Lima VM. Windows support needs WSL2. Frequent breaking changes. Decision: a2a-sdk + Starlette + uvicorn gives full A2A compliance without K8s overhead.

Why A2A 0.3.x, not 1.0?

v1.0 RC has breaking changes: Part type unification, enum case changes, method renames, well-known URL rename. Pinned at a2a-sdk>=0.3.0,<1.0 until v1.0 stabilizes. Current stable: 0.3.24 (Feb 2026).

Why LiteLLM?

Model-agnostic interface. Claude for production, Ollama for free local dev. Swap with one env var: KOURAI_PROVIDER=local.

Why sequential pipelines, not parallel?

Agents build on each other's output — Techne needs Metis's spec, Dokimasia needs Techne's code, Kallos needs the files written. Parallelism doesn't help when there's a data dependency chain. The Kallos-Techne loop is the one place where iteration (not parallelism) adds value.


📚 References

A2A Protocol

Industry Context

Stack