Adaptive & self-improving Ariadne — design¶
- Date: 2026-06-05
- Status: Approved design (implementation phased; first slice specced below)
- Decision record: ADR-0020
Problem¶
Ariadne generalizes today at code level: to analyze a new corpus, a developer
writes a DatasetAdapter, maps the source into the canonical schema, and registers
it in DATASETS (ADR-0006).
That is powerful but it means the maintainer extends Ariadne. Two capabilities are
missing for the tool to be a general sensemaking harness:
- Runtime adaptivity — a user points Ariadne at their store (schema, ontology, vocabulary) and it adapts without a code change.
- Experience improvement — Ariadne gets better at a user's data over repeated workups: it remembers what worked and proposes refinements, rather than starting cold each time.
The hard constraint: neither capability may erode Ariadne's spine — auditable, read-only, no-silent-merge, provenance-by-hook governance. A system that silently rewrites its own mappings, tools, or gates is exactly what an intelligence-analysis stakeholder cannot trust.
The organizing principle: propose → ratify → freeze¶
Every adaptive or learned change follows one lifecycle:
- Propose — the agent introspects/learns and emits a declarative artifact (a schema mapping, an ontology, a named skill) as a proposal, with provenance.
- Ratify — a human reviews and accepts (or edits) the proposal. Nothing un-ratified is used in a scored/governed workup.
- Freeze — the ratified artifact is persisted as config and becomes a fixed input the deterministic gates check against on every subsequent run.
This is the local, app-level form of the human/AI division Anthropic names for recursive self-improvement (humans keep judgment, verification, direction; the AI executes), and it answers the 2026 self-improvement literature's core warning (deployed self-improvement invites reward-hacking and untraceable drift unless "design rules and tests govern what changes are allowed").
Two axes¶
Axis A — Adaptivity (adapt to a user's data)¶
- A1 · Schema introspection. A generic connector introspects an arbitrary store
(
information_schemafor Postgres; laterdb.schemafor Neo4j; headers for CSV). The agent does iterative schema linking — retrieve only the relevant tables/columns rather than dumping the whole schema — with a self-correcting query loop. (AutoLink-style; scales past whole-schema prompting.) - A2 · Ontology / semantic layer. A declarative, human-readable config (TOML) naming the user's entity types and relationship vocabulary, plus an LLM-assisted mapping from their raw schema into it. Uses OntoKG's intrinsic-vs-relational routing — each property is either a node attribute or a traversable edge (the same distinction the node-drawer already renders). Designed so a SHACL validator can wrap the same model later (Anchor-style) without redesign.
- A3 · Dynamic MCP surface. The MCP server registers per-source tool families
(
mcp__relational__*,mcp__graph__*) at runtime as datasets connect, emittingnotifications/tools/list_changed; the lead agent routes by the discovered surface. (Standard 2026 MCP: dynamic-fastmcp / Spring AI / Docker Dynamic MCP.)
Axis B — Self-improvement (improve from experience), bounded and audited¶
- B1 · Learned mappings (procedural memory). A ratified schema mapping is persisted; the next workup on that store reuses it instead of re-introspecting. This is the first concrete bite of self-improvement and is seeded directly by A1+A2.
- B2 · Learned analytic skills. Distil high-scoring workup trajectories into named, reusable, composable skills (e.g. "cross-store employer-tie check"), discovered and composed in later workups (Voyager / ProcMEM skill-library pattern).
- B3 · Reflexion over eval. The agent reflects on its own low-scoring eval dimensions (the harness already produces a trustworthy gradient: recall / grounded / supporting-fact F1 / reconciliation / context-utilization / ICD-203 rubric — now surfaced in the report) and proposes a refined skill/mapping/query.
Why Ariadne is unusually well-positioned: audited self-improvement needs a verifiable reward and an audit trail. Ariadne already has both — the eval harness is the reward signal; provenance-by-hook + the read-only governance audit is the trail. Most agents attempting self-improvement lack either.
The hard boundary (safety architecture)¶
The self-improvement loop edits only declarative, ratified artifacts (mappings, skills, ontology). It never edits its own gates, governance rules, eval scorers, or code. An agent that can edit its own grader will learn to game it; keeping the gates and human ratification as fixed points is what makes the loop defensible. This is the non-negotiable line, recorded in ADR-0020.
First slice (the increment to build next)¶
Goal: point Ariadne at a real Postgres the maintainer did not hand-write an adapter for, and run a grounded workup over it — via an introspected, human-ratified, frozen mapping into the existing canonical schema.
Scope (deliberately minimal — A1 + A2-into-existing-schema + B1 seed; no custom user ontology yet, no dynamic MCP yet):
- Introspection — a read-only
information_schemareader producing a structured schema summary (tables, columns, types, foreign keys). Reuses the postgres-mcp restricted-mode posture (ADR-0003). - Proposed mapping — an injected mapper (Protocol; hermetic fake + real Claude
behind an extra, mirroring the rubric judge) emits a candidate mapping: which
tables→entity types, which columns→attributes (intrinsic) vs which FKs→edges
(relational), into the canonical
person/org/site/document+ edge schema. - Ratify + freeze — the proposal is written as
mapping.toml; a human edits/ accepts; a deterministic validator checks structural integrity before it is usable. The canonical schema'stypeis an open string (it deliberately avoids a "god model"), so the validator checks structure, not a closed type list: every mapped column exists in the introspected summary, each entity declares anid+namecolumn, and every relationship's endpoint tables are themselves mapped to entities (the loadability check the 2026 schema-mapping work flags — entities that "look right" but can't be loaded because an edge endpoint is unmapped). - Apply — a frozen
mapping.tomldrives aDatasetAdapterover the live Postgres, so the existing indexer + workup + eval run unchanged on the user's data.
Out of scope for the first slice (later phases): user-defined ontologies (A2 full), dynamic MCP registration (A3), learned skills (B2), reflexion (B3), SHACL validation, Neo4j/CSV introspection.
Components & isolation¶
| Unit | Responsibility | Depends on |
|---|---|---|
introspect/postgres.py |
read-only schema summary from information_schema |
psycopg (read-only) |
mapping/propose.py |
SchemaMapper Protocol → candidate mapping (fake + Claude) |
injected judge/model |
mapping/schema.py |
the mapping.toml model + deterministic validator |
canonical schema |
mapping/adapter.py |
a DatasetAdapter driven by a frozen mapping.toml |
ADR-0006 seam |
Each is independently testable; the mapper is injectable so the engine stays pure and hermetic.
Testing¶
- Pure introspection over a fixture
information_schemadump (no live DB) + one live integration test (testcontainers Postgres) behind theintegrationmarker. - Mapper engine tested with a hermetic fake
SchemaMapper; the real Claude mapper behind an extra + anetwork/integrationtest. - Validator: golden + adversarial mappings (column not in the introspected schema, relationship endpoint table unmapped, entity missing its id/name column).
- A frozen-
mapping.toml→ adapter → indexer → workup →evalround-trip on a small seeded Postgres, asserting a grounded note (the propose→ratify→freeze loop end to end).
Risks & mitigations¶
- Mapping is wrong/misleading → human ratify gate + deterministic validator; never auto-applied.
- Scope creep into full ontology engine → first slice maps into the existing schema only; ontology is a later, separately-specced phase.
- Self-improvement drift / reward-hacking → the hard boundary (loop never edits gates/scorers/code); every learned artifact is ratified and gate-checked.
- Read/write safety on a user's DB → introspection and queries are read-only (restricted mode), audited by the existing governance pass.