API Pricing¶
Which providers Kourai Khryseai can use, which models map to each tier, and what to watch out for cost-wise.
For exact per-token rates, see the provider pricing pages linked below. Rates change frequently — this doc focuses on the structure that stays stable.
Providers¶
KOURAI_PROVIDER |
You pay | Pricing page |
|---|---|---|
anthropic (default) |
Anthropic API | claude.com/pricing#api |
google |
Google AI (Gemini API) | ai.google.dev/gemini-api/docs/pricing |
local |
Free — runs on your GPU via Ollama | ollama.com |
Model Tiers¶
| Tier | Model | Used by |
|---|---|---|
cheap (default) |
Claude Haiku 4.5 | All agents |
standard |
Claude Sonnet 4.6 | Hephaestus, Metis, Techne |
smart |
Claude Sonnet 4.6 + Opus 4.7 (Metis) | All agents (Opus for Metis only) |
| Tier | Model | Used by |
|---|---|---|
cheap |
Gemini 2.5 Flash-Lite | All agents |
standard |
Gemini 2.5 Pro + Flash-Lite | Hephaestus / Metis / Techne on Pro, others on Flash-Lite |
smart |
Gemini 2.5 Pro (Puck / Aidos on Flash-Lite) | All agents |
Google free tier
Free tier prompts are used to improve Google's products. Switch to Paid tier in AI Studio to opt out.
| Model | Used by | VRAM |
|---|---|---|
llama3.3:70b |
Hephaestus, Metis, Techne, Aletheia | ~40 GB |
qwen2.5-coder:32b |
Dokimasia | ~20 GB |
llama3.3:8b |
Kallos, Mneme, Puck, Cupid, Aidos | ~5 GB |
No per-token charges. You pay electricity and hardware only.
Rough Cost Per Pipeline¶
A typical development pipeline runs hephaestus → metis → techne → dokimasia → kallos → mneme (6 core LLM calls, ~12K input, ~8K output tokens). Companion spirits (Puck, Cupid) and quality validators (Aidos, Aletheia) fire on-demand — not every request triggers all 10 agents.
| Tier | Anthropic | Google (paid) |
|---|---|---|
cheap |
~$0.05 | ~$0.005 |
standard |
~\(0.25–\)0.40 | ~\(0.10–\)0.20 |
smart |
~\(0.40–\)0.70 | ~\(0.10–\)0.20 |
Companion/validator calls add minimal cost — Puck and Aidos use Haiku across all tiers.
What You Are NOT Charged For¶
Web search, code execution sandbox, image generation, audio input, batch API discounts, and context caching storage — Kourai uses none of these.
Cost Tips¶
Keep costs low
smarttier is expensive. Opus 4.7 costs ~5× more than Haiku 4.5. Only use when you need maximum planning quality.- Gemini 2.5 Pro thinking tokens are unpredictable. Reasoning tokens count as output at full rate. Monitor usage in Google AI Studio.
- Pipeline is sequential — core specialists make 6 billed API calls, no fan-out. Companion spirits and validators add 1–4 more calls when triggered.
- Streaming has the same cost as non-streaming. Only affects delivery, not billing.
Prompt Caching¶
Every agent system prompt is marked cache_control: ephemeral so repeated
calls within a Techne / Kallos / Dokimasia tool-use loop pay the cached-read
rate (≈10% of input cost) instead of the full input rate. Iterations 2–N of
each fix loop hit the cache for the [system + tools + initial-user] prefix
— typically 2K–10K tokens of file contents and git context.
Cross-call caching of just the agent system prompt does not pay today (the prompts are below Anthropic's 1024-token Sonnet 4.6 and 4096-token Haiku 4.5 / Opus 4.7 thresholds, verified May 2026); within-loop is the actual win.
In-session usage tracking¶
/usage (alias /cost) in the CLI prints a per-(agent, model) breakdown of
calls, input / output / cache-read / cache-write tokens, and dollar cost for
the current REPL session. /reset_usage zeros the counter.