Requirements & Priorities
This is Ariadne's prioritization made explicit: every requirement traced to a
stakeholder need (a brief success criterion or design constraint), the
metric that proves it, its priority tier with rationale, and its
feasibility/status. It answers the Sense & Orient worksheet's step 7
("Prioritization of Requirements") and feeds steps 2/6 (metric framing and
goal→metric mapping) and step 8 (sensing back-brief).
The stakeholder needs are the brief's own, recorded in the
charter:
- SC1: Traverse organizational relationships.
- SC2: Reconcile information across modalities.
- SC3: Reduce the analyst's manual pivot burden.
- SC4: Surface non-obvious connections impractical to find by hand.
- C-spec: Specification & validation ("how do you know what works?"), the
brief's stated core challenge.
- C-gov: Governance uniform across quality, security, and data integrity.
How we prioritize (the rule)
- P0: Demonstrable value. The four success criteria define what "works"
means. Each must be provable by a metric on a planted-needle fixture. Nothing
ranks above proving SC1-SC4.
- P1: The validation spine. The brief's core challenge is C-spec, not the
features themselves: an unvalidated P0 is worthless because no one can trust it.
So the gates and eval harness that certify P0, and the governance that makes
them safe (C-gov), sit immediately below P0, above any expansion.
- P2: Generalization & deployment. Proving the harness transfers beyond one
synthetic graph (the secondary deliverable) and runs in the brief's required
deployments. Valuable, but only once P0/P1 hold.
- P3: Stretch / deferred. Real but YAGNI-gated until a use case demands them.
Feasibility is rated against the current toolset; most P0/P1 are already shipped,
which is why they are P0/P1 and not aspirations.
P0: Demonstrable value (the four success criteria)
| Requirement |
Serves |
Metric (proof) |
Feasibility |
Status |
| Multi-hop graph traversal across the org hierarchy |
SC1 |
trajectory + grounded (ledger shows the path was walked, not guessed) |
Shipped (Neo4j connector) |
✅ |
| Cross-modality reconciliation (corroborate / flag conflict) |
SC2 |
reconciliation (corroboration + conflict cues, both stores queried), sf_f1 |
Shipped (graph+relational+text) |
✅ |
| Lower manual pivot burden via one analytic interface |
SC3 |
pivot_burden (queries/hops in a single agent loop) |
Shipped |
✅ |
| Surface planted non-obvious links |
SC4 |
needle recall + grounded on Halberd↔Wren and kaminski-aol fixtures |
Shipped |
✅ |
P1: The validation spine (C-spec + C-gov)
| Requirement |
Serves |
Metric (proof) |
Feasibility |
Status |
| No-fabrication + no-uncited-claim citation gate |
C-spec |
citation report dangling/uncited/unsupported = 0 |
Shipped |
✅ |
| Factual-precision check (entailment) |
C-spec |
HHEM unsupported on non-estimative claims |
Shipped (eval extra) |
✅ |
| Analytic calibration (ICD-203/206 tradecraft) |
C-spec, C-gov (quality) |
tradecraft lint: estimative-band + confidence presence |
Shipped |
✅ |
| Analytic-standards rubric (the axes gates can't see) |
C-spec |
ICD-203 rubric overall + per-criterion (judged independently) |
Shipped (rubric extra) |
✅ |
| Planted-needle eval harness with discriminating power |
C-spec |
grounded/recall/trajectory; proven to fail a weak model cleanly |
Shipped |
✅ |
| Read-only governance audit over the ledger |
C-gov (security/integrity) |
governance.json write-verb violations = 0 |
Shipped |
✅ |
| Per-product model + egress audit |
C-gov (security) |
profile name + egress class in governance.json + OTel |
Shipped (ADR-0013) |
✅ |
| Uniform observability (GenAI OTel) |
C-spec, C-gov |
spans/metrics on every workup |
Shipped (otel extra) |
✅ |
P2: Generalization & deployment (secondary deliverable + deployment constraint)
| Requirement |
Serves |
Metric / proof |
Feasibility |
Status |
| Dataset-agnostic pipeline (write one adapter) |
Secondary |
Enron adapter runs the same workup+eval unchanged |
Shipped (ADR-0006) |
✅ |
| Hybrid retrieval (full-text + vector, RRF) |
SC4, secondary |
semantic leg surfaces text-only needles |
Shipped (ADR-0007) |
✅ |
| Cloud ↔ air-gapped on one codebase |
Deployment constraint |
open-weight model clears the eval bar via the single seam |
Validating (ADR-0012) |
🟡 |
| User-selectable model profiles (governed) |
Usability + C-gov |
unknown profile rejected; air-gap omits cloud profiles |
Shipped (ADR-0013) |
✅ |
| Distribution as MCP server + plugin |
Adoption |
installs and runs from any MCP client |
Shipped (ADR-0009) |
✅ |
P3: Stretch / deferred (YAGNI-gated)
| Requirement |
Serves |
Why deferred |
Status |
| Subagent fan-out (parallel per-source workers) |
scale |
shared-context reconciliation + provenance redesign; ~15× tokens for a 2-store slice already scoring grounded |
Deferred (ADR-0005) |
| Multi-player shared sessions |
stretch goal |
no validated single-player value gap yet |
Deferred |
| Native multimodal embeddings |
SC2 (media) |
API-only breaks the air-gap and is an uncitable black box |
Rejected (ADR-0008) |
| Network egress enforcement |
C-gov |
curation suffices pre-deploy; enforcement is a deployment-time control |
Deferred |
What this exposes
- P0 and P1 are essentially complete. The demonstrable-value criteria all have
passing metrics, and the validation spine that certifies them is shipped. That is
the deliberate over-investment in C-spec the brief asks for.
- The live edge is P2 deployment validation (ADR-0012: which open-weight model
clears the eval bar), exactly where current work sits.
- Nothing in P0/P1 is blocked. The one remaining blocked item (licensed Avocado
data) is P2 and gated on data access, not on engineering. The PyPI publish has shipped
(
ariadne-sensemaking).