Architecture¶
Design notes for Ariadne's harness. The shape below is implemented end to end, from heterogeneous retrieval and analytic rigor through distribution and an adaptive, self-improving harness; the Decision log records each contestable choice, and the Roadmap tracks what is built versus planned.
Building blocks¶
The orchestration layer is the Claude Agent SDK. Its primitives are the vocabulary every design choice is expressed in:
- Tools: callable retrieval/processing functions (in-process or via MCP).
- Skills: packaged multi-step analytic procedures (e.g.
entity-workup). - Hooks: lifecycle interceptors for provenance, authorization, and audit.
- Subagents: context-isolated workers for parallel per-source retrieval.
- MCP: the connector standard surfacing graph / SQL / vector stores as tools.
See the Claude Agent SDK Reference for the full doc-cited mechanics.
Decisions¶
Significant, contestable choices, which store, which connector, what was deferred and why, live in the Decision log as ADRs: the single place to point when asked "why this instead of that?" Notable examples: the Postgres-over-Redis comparison (ADR-0004) and the dataset-abstraction approach (ADR-0006).
Emerging shape (from the research)¶
The best-practice research points to an orchestrator-worker design: a lead agent runs the gather → act → verify → repeat loop and dispatches context-isolated subagents (one per source) that retrieve via MCP and hand back only their findings. GraphRAG traverses organizational hierarchies; multimodal evidence is fused agentically by converting imagery/video to structured text before reasoning over it.
Today the loop runs as a single lead agent querying the stores directly. Subagent fan-out is deferred as YAGNI (trigger: store count ≥4 or a measured latency bottleneck; the provenance blocker is dissolved now that the SDK hook fires inside subagents) per ADR-0005 and ADR-0015. The diagram below shows the target shape.
Datasets¶
A developer adds a corpus by writing one adapter that maps raw data to the
canonical schema below; a dataset-agnostic indexer fans the records into the
stores. The agent, entity-workup skill, connectors, and eval harness never
change. Governance (access-control, PII gating) attaches at the canonical layer
once, not per dataset.
Free-text retrieval is hybrid: Postgres full-text (tsvector GIN,
websearch_to_tsquery) fused with pgvector semantic search via Reciprocal Rank
Fusion. The agent reaches it through the in-process mcp__ariadne__hybrid_search
tool (opt-in --semantic), per
ADR-0007.
The same seam takes any source. Four adapters spanning four modalities map into one canonical schema with no change to the agent, connectors, or eval harness (ADR-0006; pipeline design spec).
Neo4j + Postgres
Distribution¶
Ariadne is consumable as an MCP server (the workup tool runs the full harness
and returns a cited analytic note) from any MCP client, with a Claude Code
plugin wrapper for one-click install and slash-command UX, per
ADR-0009. It is published to PyPI as
ariadne-sensemaking (uvx ariadne-sensemaking).
Adaptive & self-improvement¶
Beyond the built-in datasets, Ariadne adapts to a user's own store and improves from experience (ADR-0020). Every change rides one cycle, turning around a core the loop can never touch:
The loop edits only ratified artifacts; the eval harness it is scored against stays off-limits, so an agent can never quietly grade its own work.
- Adapt (Axis A): introspect a real Postgres, propose a mapping into the canonical schema
(deterministic or LLM), ratify it, and the existing indexer / workup / eval run unchanged on
the user's data (
ariadne map); plus a declarative user ontology and a dynamic MCP surface that activates a ratified store at runtime. - Learn (Axis B): distil a high-scoring workup into a reusable analytic skill
(
ariadne distil), reflect on a low-scoring one and propose grounded, gold-free refinements (ariadne reflect), deepen an existing skill from a new run (distil --into), and measure whether a learned change actually helps before adopting it (ariadne compare: repairs net of regressions on the same eval instance). The eval harness is the external verifiable reward the loop can never edit.
Knowing it works¶
The brief's central challenge is "how do you know what works?" Ariadne answers it with a tiered eval pyramid: cheap, exhaustive checks on every output at the base; expensive, sampled judgment at the apex.
volume falls and cost rises toward the apex
Still open¶
The pieces not yet built or still settling:
- Multimodal fusion (Phase 3): image/video/OCR extraction and cross-modal evidence fusion.
- Subagent fan-out: parallel per-source workers, deferred as YAGNI until store count ≥4 or a measured latency bottleneck (design specified in ADR-0015).
- Entity resolution across stores: currently a stable shared key per entity; a richer resolver remains an open research question.
- Open-weight validation: which self-hostable model clears the eval bar through the air-gap seam (open-weight validation).