0027, Declarative user ontology — a lightweight TOML vocabulary the mapper maps into¶
- Status: Accepted (2026-06-07)
- Deciders: Ariadne maintainers
- Refines: ADR-0020 (axis A2, the semantic layer) · builds on ADR-0026 (the agentic mapper) + ADR-0025 (apply)
Context¶
A1 maps a user's introspected schema onto Ariadne's canonical schema where the
entity/relationship type is an open string — the mapper "picks the most
natural one" (person / org / MEMBER_OF / …) with nothing constraining it.
That is right for a generic first draft, but a real analyst works in a domain:
their world is Vessel / Berth / MOORED_AT, or Indicator / Campaign /
ATTRIBUTED_TO — a closed vocabulary they want the graph to speak, not whatever
synonym the model reached for this run.
A2 is that semantic layer: let a user declare their own entity-type +
relationship vocabulary, and have the mapper map into that — every entity
typed as one of their declared types, every relationship one of their declared
edges routed domain → range. The contestable questions are (a) what the user
declares it in, and (b) how the closed vocabulary is enforced. Hence this ADR.
Decision drivers¶
- A closed vocabulary is only worth declaring if it's enforced. The mapper must be constrained to the declared types and verified against them, not merely nudged. June-2026 ontology-aligned extraction makes schema-compliant type assignment a validated property, not a hope (Anchor, OntoLogX).
- Intrinsic-vs-relational routing is the declarative core. Every property is
either an intrinsic attribute (a node column) or a relational edge (OntoKG).
Ariadne's
Mappingalready separatesattribute_columnsfromrelationships; the ontology only has to declare which edge types are legal and what they connect, so the routing decision becomes a typed, checkable claim. - Stay in Ariadne's idiom, stay lean. Every config surface here is TOML
(
mapping.toml, the[dataset]header, profiles). A user's vocabulary is a handful of names — it should not drag in a modeling framework or a second serialization language. - SHACL-validatable later, by construction. The note in ADR-0020 promises
SHACL validation eventually; the declared shape must transpile mechanically
to SHACL node/property shapes when that lands, so the format has to line up with
SHACL's concepts (a node shape per entity type, a property shape per
relationship type with
sh:classdomain/range). - Reuse the shipped spine. Forced tool-use + a bounded, validator-terminated retry loop already exist (ADR-0026). The ontology should ride them — constrain the tool schema, feed ontology violations back as repair errors — not add a parallel mechanism.
Considered options¶
- Keep open-string types; no ontology (status quo). Rejected. It gives the user no control over their domain vocabulary, lets the model invent a different type each run, and produces nothing portable or contestable at the user's level. It is the thing A2 exists to fix.
- Adopt LinkML (YAML) as the declaration format. Considered, deferred. LinkML is the 2026 reference data-modeling framework: one YAML model is the single source of truth that generates OWL, SHACL, and JSON Schema, and OntoGPT / SPIRES already turn LinkML schemas into extraction prompts. But its strengths are redundant or premature here — its LLM-extraction value duplicates the forced-tool-use loop we already ship; its SHACL generation is the "later" we can reach with a ~30-line transpiler; and it costs a heavy transitive dependency tree plus a YAML idiom break from our all-TOML config, to model classes / slots / mixins / inheritance that a name-and-domain-range vocabulary does not need (speculative generality, YAGNI). Recorded as the upgrade path if the ontology ever needs imports, inheritance, or multi-target codegen.
- Full OWL/RDF ontology + a reasoner. Rejected. Open-world description-logic reasoning and a reasoner dependency are far past "a user names a dozen types." Closed-world SHACL-style structural validation is the right minimum (OntoLogX's stage 2: syntactic → SHACL compliance → semantic), and we get there without importing a reasoner.
- Large-ontology search-and-navigate (Anchor's hybrid ontology discovery over UCO/STIX-scale schemas). Deferred. That solves scale — ontologies too large to fit in a prompt, where prompt-based schema inclusion is exactly what breaks. A user's lightweight declared vocabulary fits in the prompt and the tool schema, so prompt/enum inclusion is the correct tool now; navigate-a-huge-ontology is the deferred large path, mirroring ADR-0026's AutoLink deferral for large schemas.
- A lightweight
ontology.toml— declared entity types + relationship types (withdomain/range), enum-injected into the forced-tool schema, enforced by a deterministicvalidate_against_ontologyinside the existing repair loop, and designed to transpile mechanically to SHACL. Chosen.
Decision¶
Adopt option 5.
- The artifact. A user writes an
ontology.toml: an array of[[entity_types]](name, optionaldescription) and an array of[[relationship_types]](name,domain,range— the from/to entity-type names — and optionaldescription).mapping/ontology.pyloads it into frozenEntityType/RelationshipType/Ontologydataclasses.domain/rangeare single entity-type names in v1 (matching the 1:1 foreign-key→edge reality and the baseline's output); multi-domain/range is the deferredsh:orextension. - Enforcement —
validate_against_ontology(mapping, ontology) -> list[str]. Deterministic, pure, returns structural-style errors: every entitytypemust be a declared entity type; every relationshiptypemust be a declared relationship type; and each relationship's endpoints must obey the declared routing — thefrom_table's entity type must equal the edge'sdomain, theto_table's therange. (Endpoints that aren't mapped at all are left to the existing structuralvalidate_mapping, so the two validators compose without double-reporting.) This is the relational half of intrinsic-vs-relational routing made checkable. - Guidance — enum-constrained forced tool-use. When an ontology is present,
build_map_tool(ontology)injects the declared names as JSON-Schemaenums on the entity/relationshiptypefields, andbuild_mapping_promptdescribes the vocabulary (types + the legaldomain → rangeedges). The enum hard-constrains the model's structured output to the vocabulary (constrained generation); the validator catches the routing the enum can't express (whichdomain/rangea given edge connects). This is prompt-based schema inclusion — correct for a small user ontology. - The loop.
propose_with_repairtakes the optionalontologyand runsvalidate_mappingandvalidate_against_ontology, re-prompting with the union of complaints until the proposal is both loadable and ontology-conformant or the bound is hit — the same deterministic-gate-terminates-the-loop stance as ADR-0026, now extended to ontology conformance. - CLI + the baseline/LLM split.
ariadne map --ontology PATHloads the ontology, configures the mapper, and validates against it. The LLM mapper is guided by the vocabulary (enum + prompt). The deterministic baseline cannot invent a user's vocabulary from table names, so for it the ontology is a validation-only layer — it still proposes its heuristic draft, and the ontology violations are reported for the human to resolve. Ontology guidance is honestly an LLM capability; ontology enforcement is available to both.
Consequences¶
- A2's first slice closes. A user maps their store into their own domain
vocabulary, type-checked and routing-checked before a human ever sees the draft,
through the unchanged ADR-0025 freeze→apply path. The open-string default still
works when no
--ontologyis given. - SHACL is now a mechanical transpile, deferred not blocked. Entity types →
sh:NodeShape, relationship types →sh:PropertyShapewithsh:classdomain/range; the TOML was shaped to line up. When SHACL validation lands it replaces/augmentsvalidate_against_ontology, it doesn't restructure anything. - No new dependency; the idiom holds. The ontology is TOML parsed by the
stdlib
tomllib, like every other config artifact. LinkML stays the documented upgrade path, not a cost we pay now. - Large-ontology navigation is deferred, the same way large-schema exploration is (ADR-0026) — both slot in behind the same seams when a user brings something that doesn't fit a prompt.
- One honest asymmetry: the deterministic baseline gains validation but not
guidance, so
--ontologywithout--llmwill usually surface violations rather than a conformant draft. That is the truthful capability line, and it is documented in the command's help.
Sources: intrinsic-vs-relational routing + portable declarative schema — OntoKG, arXiv:2604.02618; SHACL-enforced schema-compliant typing + prompt-inclusion fails only at large-ontology scale — Anchor, arXiv:2606.01208; validator-driven correction (syntactic → SHACL compliance → semantic) — OntoLogX, arXiv:2510.01409; deferred-correction token tradeoff (considered for the loop) — Better Later Than Sooner, arXiv:2605.29168; the heavier single-source-of-truth alternative — LinkML, arXiv:2511.16935.