0010, Observability via OpenTelemetry (GenAI semantic conventions)¶
- Status: Accepted (2026-06-03)
- Deciders: Ariadne maintainers
- Relates to: ADR-0001 (Ariadne runs on the Claude Agent SDK, which has its own OTEL integration)
Context¶
As the harness matures we need runtime visibility into the MCP server and the
workup loop: task duration (time-to-report), query volume (how many
evidence calls per run), analytic accuracy (citation coverage, entailment
precision, eval scores), and governance compliance (ICD-203 tradecraft,
uncited/dangling claims). Without traces and metrics these properties are only
observable through the file artifacts (citations.json, tradecraft.json,
eval output), there is no cross-run dashboard, no latency baseline, and no
hook into a shared observability stack when Ariadne is deployed inside a larger
environment.
Decision drivers¶
- Vendor-neutral / cross-backend: the same instrumentation should export to Datadog, Google Cloud Trace, AWS X-Ray, Azure Monitor, Jaeger, Grafana, and the Claude Agent SDK's own telemetry without a code change.
- Reuse signals Ariadne already computes: most metrics of interest are
already computed: query count (provenance-ledger
gN), citation coverage/precision, tradecraft compliance, eval scores. We are surfacing them as telemetry, not recomputing them. - Zero cost at the base install: operators who do not configure an OTLP endpoint must not pay any runtime overhead.
- Standard so it composes: the Claude Agent SDK already emits OTEL via
CLAUDE_CODE_ENABLE_TELEMETRY=1; Ariadne's spans should nest under those naturally, not create a parallel, incompatible trace tree.
Considered options¶
A. OpenTelemetry, GenAI semantic conventions (chosen)¶
- Pros:
- CNCF-backed standard; Datadog, Google, AWS, Azure, Grafana, and Jaeger all consume OTLP natively.
- The emerging
gen_ai.*attribute family +invoke_agent/execute_tool/chatspan kinds are purpose-built for agentic LLM systems. opentelemetry-apiships a no-op implementation: without the SDK configured, every span/counter is a no-op and the base install is unaffected.- Composes with the Agent SDK's own telemetry (
CLAUDE_CODE_ENABLE_TELEMETRY=1): the SDK's per-LLM-call spans nest under Ariadne'sinvoke_agentspan automatically when both use the same OTLP pipeline. - Isolated in the optional
otelextra, operators that want telemetry adduv sync --extra otel; everyone else ignores it. - Cons:
- The
gen_ai.agent.*attribute names are still experimental; they may shift before the GenAI semconv reaches stability 1.0.
B. Vendor SDK (Datadog, Langfuse, Arize, etc.)¶
- Pros: turnkey dashboards, purpose-built LLM observability views, minimal config.
- Cons: couples the harness to one backend; any operator not on that vendor must re-instrument. Incompatible with the cross-backend and composability drivers.
C. Custom logging / JSON metrics¶
- Pros: zero new dependencies; the file artifacts (
provenance.jsonl,citations.json,tradecraft.json) are already written. - Cons: no distributed tracing (no trace/span correlation across calls), no standard exporters, no shared-stack integration, and reinvents everything that the OTel SDK gives for free. The artifacts are useful locally but do not compose with an operator's observability infrastructure.
Decision¶
Adopt A. Emit one invoke_agent span per workup (duration = task time) +
per-run metrics via OpenTelemetry. The key design principle is that most
metrics surface already-computed artifacts: the only genuinely new
measurement is timing:
| Signal | Source | New? |
|---|---|---|
| Task duration / time-to-report | run_workup wall time |
Yes, new |
Evidence calls (ariadne.evidence.calls) |
provenance-ledger gN count |
No, existing |
Citation failures (ariadne.citation.failures) |
citations.json uncited/unsupported |
No, existing |
| Tradecraft compliance (span attrs) | tradecraft.json estimative terms / has-confidence |
No, existing |
Governance violations (ariadne.governance.violations + span attrs) |
governance.json read-only audit |
No, existing |
Workup count (ariadne.workups) |
counter incremented per run | Minimal |
Workup duration histogram (ariadne.workup.duration) |
same wall time as the span | Minimal |
opentelemetry-api is a core dependency (no-op until configured).
opentelemetry-sdk and opentelemetry-exporter-otlp live in the otel extra
and in dev dependencies. setup_telemetry() reads
OTEL_EXPORTER_OTLP_ENDPOINT and wires up the SDK; if the env var is absent or
the otel extra is not installed, every call is a silent no-op.
Consequences¶
- One OTel layer → latency / query-volume / accuracy / governance-compliance dashboards per run, exportable to any OTLP-speaking backend.
- Base install emits nothing (api-only no-op); the
otelextra is an explicit opt-in. - The Agent SDK's per-LLM-call spans (
chat,execute_tool) nest under Ariadne'sinvoke_agentspan whenCLAUDE_CODE_ENABLE_TELEMETRY=1is set, one trace covers the full workup. - The
gen_ai.agent.nameattribute (and peers) are marked experimental; flagged for review when the GenAI semconv reaches stability 1.0. - Operators running Ariadne inside a shared OTEL pipeline (e.g. Jaeger + Grafana in an enterprise deployment) get Ariadne traces alongside other services with no extra plumbing.