API Pricing¶
Which providers Kourai Khryseai can use, which models map to each tier, and what to watch out for cost-wise.
For exact per-token rates, see the provider pricing pages linked below. Rates change frequently — this doc focuses on the structure that stays stable.
Providers¶
KOURAI_PROVIDER |
You pay | Pricing page |
|---|---|---|
anthropic (default) |
Anthropic API | claude.com/pricing#api |
google |
Google AI (Gemini API) | ai.google.dev/gemini-api/docs/pricing |
local |
Free — runs on your GPU via Ollama | ollama.com |
Model Tiers¶
| Tier | Model | Used by |
|---|---|---|
cheap (default) |
Claude Haiku 4.5 | All agents |
standard |
Claude Sonnet 4.6 | Hephaestus, Metis, Techne |
smart |
Claude Sonnet 4.6 + Opus 4.6 (Metis) | All agents (Opus for Metis only) |
| Tier | Model | Used by |
|---|---|---|
cheap |
Gemini 2.0 Flash | All agents |
standard / smart |
Gemini 2.5 Pro | Heavy agents / all agents |
Google free tier
Free tier prompts are used to improve Google's products. Switch to Paid tier in AI Studio to opt out.
| Model | Used by | VRAM |
|---|---|---|
llama3.3:70b |
Hephaestus, Metis, Techne, Aletheia | ~40 GB |
qwen2.5-coder:32b |
Dokimasia | ~20 GB |
llama3.3:8b |
Kallos, Mneme, Puck, Cupid, Aidos | ~5 GB |
No per-token charges. You pay electricity and hardware only.
Rough Cost Per Pipeline¶
A typical development pipeline runs hephaestus → metis → techne → dokimasia → kallos → mneme (6 core LLM calls, ~12K input, ~8K output tokens). Companion spirits (Puck, Cupid) and quality validators (Aidos, Aletheia) fire on-demand — not every request triggers all 10 agents.
| Tier | Anthropic | Google (paid) |
|---|---|---|
cheap |
~$0.05 | ~$0.005 |
standard |
~\(0.25–\)0.40 | ~\(0.10–\)0.20 |
smart |
~\(0.40–\)0.70 | ~\(0.10–\)0.20 |
Companion/validator calls add minimal cost — Puck and Aidos use Haiku across all tiers.
What You Are NOT Charged For¶
Web search, code execution sandbox, image generation, audio input, batch API discounts, and context caching storage — Kourai uses none of these.
Cost Tips¶
Keep costs low
smarttier is expensive. Opus 4.6 costs ~5× more than Haiku 4.5. Only use when you need maximum planning quality.- Gemini 2.5 Pro thinking tokens are unpredictable. Reasoning tokens count as output at full rate. Monitor usage in Google AI Studio.
- Pipeline is sequential — core specialists make 6 billed API calls, no fan-out. Companion spirits and validators add 1–4 more calls when triggered.
max_tokens=4096caps output. Techne and Metis are the most expensive per call due to output length.- Streaming has the same cost as non-streaming. Only affects delivery, not billing.
Prompt Caching (not yet implemented)¶
Future optimization
Every agent has a large static system prompt — an ideal caching candidate.
Both Anthropic and Google offer 80–90% input cost savings on cached reads.
LiteLLM supports this via the cache_control content block parameter.