Architecture¶

Design notes for Ariadne's harness. The shape below is implemented end to end, from heterogeneous retrieval and analytic rigor through distribution and an adaptive, self-improving harness; the Decision log records each contestable choice, and the Roadmap tracks what is built versus planned.

Building blocks¶

The orchestration layer is the Claude Agent SDK. Its primitives are the vocabulary every design choice is expressed in:

Tools: callable retrieval/processing functions (in-process or via MCP).
Skills: packaged multi-step analytic procedures (e.g. entity-workup).
Hooks: lifecycle interceptors for provenance, authorization, and audit.
Subagents: context-isolated workers for parallel per-source retrieval.
MCP: the connector standard surfacing graph / SQL / vector stores as tools.

See the Claude Agent SDK Reference for the full doc-cited mechanics.

Decisions¶

Significant, contestable choices, which store, which connector, what was deferred and why, live in the Decision log as ADRs: the single place to point when asked "why this instead of that?" Notable examples: the Postgres-over-Redis comparison (ADR-0004) and the dataset-abstraction approach (ADR-0006).

Emerging shape (from the research)¶

The best-practice research points to an orchestrator-worker design: a lead agent runs the gather → act → verify → repeat loop and dispatches context-isolated subagents (one per source) that retrieve via MCP and hand back only their findings. GraphRAG traverses organizational hierarchies; multimodal evidence is fused agentically by converting imagery/video to structured text before reasoning over it.

Today the loop runs as a single lead agent querying the stores directly. Subagent fan-out is deferred as YAGNI (trigger: store count ≥4 or a measured latency bottleneck; the provenance blocker is dissolved now that the SDK hook fires inside subagents) per ADR-0005 and ADR-0015. The diagram below shows the target shape.

Target entitya person, unit, or org node

Lead agentgather → act → verify → repeat

dispatches parallel, context-isolated subagents

GraphNeo4jrelationships via MCP

RelationalPostgresrecords via MCP

UnstructuredHybrid searchtext + vector via MCP

findings return to the lead

Synthesisprovenance + citation gate

Cited analytic note

Datasets¶

A developer adds a corpus by writing one adapter that maps raw data to the canonical schema below; a dataset-agnostic indexer fans the records into the stores. The agent, entity-workup skill, connectors, and eval harness never change. Governance (access-control, PII gating) attaches at the canonical layer once, not per dataset.

Attribute

key

value

Document

text

source

Entity

name

type

aliases[]

Relationship

type

source → Entity

target → Entity

1 : N

mentions

source · target

Free-text retrieval is hybrid: Postgres full-text (tsvector GIN, websearch_to_tsquery) fused with pgvector semantic search via Reciprocal Rank Fusion. The agent reaches it through the in-process mcp__ariadne__hybrid_search tool (opt-in --semantic), per ADR-0007.

The same seam takes any source. Four adapters spanning four modalities map into one canonical schema with no change to the agent, connectors, or eval harness (ADR-0006; pipeline design spec).

syntheticgraph

enronemail · text

lahmanrelational

worldspeechaudio

adapterone seam

canonical schema
Neo4j + Postgres

Distribution¶

Ariadne is consumable as an MCP server (the workup tool runs the full harness and returns a cited analytic note) from any MCP client, with a Claude Code plugin wrapper for one-click install and slash-command UX, per ADR-0009. It is published to PyPI as ariadne-sensemaking (uvx ariadne-sensemaking).

Adaptive & self-improvement¶

Beyond the built-in datasets, Ariadne adapts to a user's own store and improves from experience (ADR-0020). Every change rides one cycle, turning around a core the loop can never touch:

Proposethe agent drafts a declarative artifact

Ratifya human approves or rejects

Freezeit becomes config the gates check

eval gate · governancethe loop can never edit this

The loop edits only ratified artifacts; the eval harness it is scored against stays off-limits, so an agent can never quietly grade its own work.

Adapt (Axis A): introspect a real Postgres, propose a mapping into the canonical schema (deterministic or LLM), ratify it, and the existing indexer / workup / eval run unchanged on the user's data (ariadne map); plus a declarative user ontology and a dynamic MCP surface that activates a ratified store at runtime.
Learn (Axis B): distil a high-scoring workup into a reusable analytic skill (ariadne distil), reflect on a low-scoring one and propose grounded, gold-free refinements (ariadne reflect), deepen an existing skill from a new run (distil --into), and measure whether a learned change actually helps before adopting it (ariadne compare: repairs net of regressions on the same eval instance). The eval harness is the external verifiable reward the loop can never edit.

Knowing it works¶

The brief's central challenge is "how do you know what works?" Ariadne answers it with a tiered eval pyramid: cheap, exhaustive checks on every output at the base; expensive, sampled judgment at the apex.

LLM-as-judge

ICD-203 analytic-standards rubric, on a sample of survivors

slow · costly · sampled

Entailment (NLI)

HHEM-2.1 checks each cited claim against its evidence

fast · local · per claim

Deterministic floor

citation gate + ICD-203 tradecraft lint on every output

microseconds · $0 · every claim

volume falls and cost rises toward the apex

Still open¶

The pieces not yet built or still settling:

Multimodal fusion (Phase 3): image/video/OCR extraction and cross-modal evidence fusion.
Subagent fan-out: parallel per-source workers, deferred as YAGNI until store count ≥4 or a measured latency bottleneck (design specified in ADR-0015).
Entity resolution across stores: currently a stable shared key per entity; a richer resolver remains an open research question.
Open-weight validation: which self-hostable model clears the eval bar through the air-gap seam (open-weight validation).