0012, Cloud vs. air-gapped deployment fork¶
- Status: Accepted (2026-06-04)
- Deciders: Ariadne maintainers
- Relates to: ADR-0001 (the Claude Agent SDK is the harness that forks at the model boundary), ADR-0007 + ADR-0008 (local-first retrieval/embedding choices that pre-empt the classic air-gap leak), ADR-0010 (OTLP exports to an on-prem backend)
Context¶
The brief constrains deployment to hybrid: Ariadne must run both cloud-first (frontier Claude) and on-prem / air-gapped (self-hosted, open-weight, no egress). The open question has been: where does the architecture actually fork, and how much of Ariadne has to change to cross the gap?
# research(2026-06): Air-gapped is an architecture, not a config flag: every
runtime dependency must be pre-staged inside the enclave (local model registry
with signed import, local inference workers, local vector DB + local embedding,
container/package mirrors, on-prem observability, internal PKI), and the egress
surface is a first-class concern: ideally the empty set, enforced by a network
policy and a CI check that fails on any new outbound dependency. The single most
common real-world leak is the embedding step: an SDK that defaults to a cloud
embedding API, caught only when network monitoring shows DNS to api.openai.com
from a "air-gapped" pipeline.
Decision drivers¶
- Minimise the fork surface: the fewer components that differ between cloud and air-gapped, the less there is to validate and keep in sync.
- No silent egress: every component must have an in-enclave substitution with no hidden call-home (the embedding-leak failure mode).
- No fork in the analytic logic: the harness, gates, provenance, and eval must be identical across both deployments; only infrastructure swaps.
Considered options¶
A. Single-seam fork at the orchestrator model (chosen)¶
Keep one codebase. Everything except the orchestrator LLM is already in-enclave
or trivially self-hostable; the model forks at the ANTHROPIC_BASE_URL boundary.
- Pros:
- The Claude Agent SDK is not a cloud lock-in: point
ANTHROPIC_BASE_URLat a LiteLLM proxy that translates the Anthropic Messages API to OpenAI-format and forwards to a local vLLM / TGI / Ollama worker serving an open-weight tool-use model. The agent loop, tools, skills, hooks, and provenance are byte-identical, only an env var changes. - Ariadne's retrieval/eval layers are already air-gap-clean by prior
decision: the embedder is local open-weight
bge-small(ADR-0007), multimodal is agentic-to-text with no cloud embedding API (ADR-0008), entailment (HHEM) runs locally, and the stores (Neo4j, Postgres+pgvector) are self-hosted containers. The classic embedding-egress leak cannot happen. - Observability already exports OTLP to any backend (ADR-0010), point it at an on-prem Jaeger/Grafana, no code change.
- The dataset abstraction (ADR-0006) isolates ingestion: only the adapter touches the source, so an air-gap deployment swaps in a local-corpus adapter instead of HF streaming.
- Cons:
- Open-weight agentic quality is the real risk, multi-hop tool-use + citation discipline + ICD-203 calibration are harder for smaller local models than for Claude. This is a validation item, not an architecture one (the rubric + needle + reconciliation evals are exactly how to measure the gap per model).
- Supply chain shifts onto the operator: open-weight repos are an attack surface (HF Safetensors conversion-hijack demonstrated; OWASP LLM03), so the model bundle needs signed import + chain-of-custody via a data diode.
B. Two codebases / two harnesses (cloud + air-gapped)¶
- Pros: each tuned to its environment.
- Cons: doubles maintenance, drifts the analytic logic, and forks exactly the parts (gates, provenance, eval) that must stay identical for governance. Rejected.
C. Cloud-only, defer air-gap¶
- Pros: simplest now.
- Cons: violates the brief's hybrid constraint and the target deployment reality; leaving it undocumented lets cloud assumptions harden into the code.
Decision¶
Adopt A, single codebase, single seam. The fork is one env boundary plus pre-staged infrastructure. Per-component swap points:
| Component | Cloud | Air-gapped substitution | Ariadne code change |
|---|---|---|---|
| Orchestrator model | Claude API | open-weight on vLLM/TGI behind a LiteLLM proxy via ANTHROPIC_BASE_URL |
None (env only) |
| Agent harness | Claude Agent SDK | same SDK, redirected | None |
| Graph store | Neo4j container | same, in-enclave | None |
| Relational + vector | Postgres + pgvector | same, in-enclave | None |
| Embedder | local bge-small (embed extra) |
same | None |
| Entailment | local HHEM (eval extra) |
same | None |
| LLM-rubric judge | Claude (rubric extra) |
open-weight via the same proxy, or skip (advisory score) | None (env) |
| Dataset ingestion | HF datasets streaming |
pre-staged local corpus via a local-file DatasetAdapter |
adapter only |
| Observability | OTLP → any backend | OTLP → on-prem Jaeger/Grafana | None |
The analytic spine, gather→act→verify→synthesize loop, provenance ledger, citation/tradecraft/governance gates, needle/reconciliation/rubric eval, is identical across both. Only the model endpoint, the corpus source, and the observability sink differ, none of which touch the analytic logic.
Consequences¶
- The hybrid constraint is satisfied with one codebase; the air-gap "port" is an ops exercise (stage weights, run vLLM+LiteLLM, point env vars, mirror containers), not a rewrite.
- Prior local-first decisions (ADR-0007, ADR-0008) are validated: they pre-empted the embedding-egress leak that breaks most air-gapped RAG pipelines.
- The open question narrows from "how do we air-gap Ariadne?" to "which open-weight model clears our eval bar?": answerable by running the existing rubric/needle/reconciliation harness against candidate local models.
- Follow-ups (tracked, not blocking): an explicit no-egress CI guard (fail the build on a new outbound dependency / a cloud-defaulting SDK), and a signed-model-bundle import process for the enclave.
Sources¶
- The Air-Gapped LLM Blueprint, egress-free deployments (2026-05)
- Claude Agent SDK with LiteLLM (LiteLLM docs) · Claude Code LLM gateway configuration
- Running LLMs in air-gapped environments (2026-03)
- Connect to your own LLM using vLLM, air-gapped (Elastic docs)
- OWASP LLM Top 10, LLM03 supply-chain (open-weight repos as attack surface)