Skip to content

Reference

A short tour of every module. For the full source, see github.com/ajbarea/orchestrate-triage.

main.py — CLI entry

python main.py [--tickets PATH] [--out PATH] [--model NAME]
               [--limit N] [--batch] [--resume] [--dry-run] [--verbose]
flag default purpose
--tickets ../support_tickets/support_tickets.csv input CSV
--out ../support_tickets/output.csv predictions CSV
--model claude-sonnet-4-6 (or MODEL env var) Claude model id
--limit N None process only first N tickets
--batch off submit as one async Message Batch (50% off)
--resume off skip tickets already in --out
--dry-run off print plan summary, make no API calls
--verbose off print each prediction as it lands

Returns exit code 0 on success, 2 on credit-balance error, 1 otherwise.

agent.py — sync triage

def triage(
    client: Anthropic,
    ticket: TicketInput,
    corpus_blob: str,
    *,
    model: str = "claude-opus-4-7",
    max_tokens: int = 2048,
) -> tuple[TicketOutput, dict]:
    ...

Builds the cached system+corpus prompt, forces a submit_triage tool call, validates the result through pydantic, returns the TicketOutput plus a usage record (which main.py writes to logs/usage.jsonl per-ticket).

The system prompt is exposed as agent.SYSTEM_PROMPT for tests / inspection. The tool definition is agent.SUBMIT_TOOL.

batch.py — Message Batches API

def run_batch(
    client: Anthropic,
    by_company: dict[str | None, list[tuple[int, TicketInput]]],
    *,
    model: str = "claude-opus-4-7",
) -> dict[int, TicketOutput | None]:
    ...

Flattens the by-company dict to a single batch (preserving original CSV row indices via custom_id="ticket-NNN"), submits, polls every 60s until processing_status == "ended", then re-threads results back into the original ticket order. Hard timeout at 24h (Anthropic's batch SLA).

build_requests() and _build_params() are the building blocks if you want to construct the payload manually for inspection.

corpus.py — per-domain markdown

def normalize_company(company: str | None) -> str | None: ...
def load_domain(company: str) -> str: ...
def list_subdirs(company: str) -> list[str]: ...

load_domain("hackerrank") returns the entire HR corpus as a single string with each file wrapped in <doc path="data/hackerrank/...">...</doc>. Frontmatter, image markup, and signed-URL params are stripped. The HR-specific subdir exclusions live in HR_EXCLUDE_SUBDIRS = {"integrations", "library"}.

@functools.lru_cache(maxsize=8) keeps the loaded blob in memory across calls within one Python invocation — particularly useful when eval.py runs multiple Sonnet domain groups back to back.

safety.py — sanitization + spotlight wrap

def sanitize(text: str | None) -> str: ...
def wrap_ticket(issue: str, subject: str, company: str) -> str: ...

sanitize drops ASCII control characters (except newline / tab) and NFC-normalizes unicode. wrap_ticket produces the <user_ticket> block sent to the model.

schema.py — pydantic models

class Status(StrEnum):       REPLIED, ESCALATED
class RequestType(StrEnum):  PRODUCT_ISSUE, FEATURE_REQUEST, BUG, INVALID

class TicketInput(BaseModel):  issue, subject, company
class TicketOutput(BaseModel): status, product_area, response, justification, request_type

CSV_COLUMNS  = [Issue, Subject, Company, Response, Product Area, Status, Request Type, Justification]
output_tool_schema()  JSON Schema dict for the submit_triage tool

TicketOutput.model_json_schema() is what we plug into the submit_triage tool's input_schema, so the schema definition lives in exactly one place.

eval.py — accuracy harness

python eval.py [--limit N] [--model NAME]

Runs the agent against ../support_tickets/sample_support_tickets.csv (10 labeled rows), compares predicted vs expected on the three structural columns (status, request_type, product_area), prints the per-column accuracy, and dumps every mismatch with predicted-vs-expected side-by-side for human review. Free-text response and justification are not graded automatically — they need human eyeballs.

tests/

uv run --group dev pytest tests/ -q

Five test files, 31 tests total, zero API calls — all run in a few hundred milliseconds.

file what it covers
test_safety.py sanitize + wrap_ticket: null bytes, unicode preservation, delimiter shape
test_corpus.py load_domain non-empty, HR subdir exclusion, image stripping
test_schema.py pydantic enum values, CSV column order, schema validation
test_main.py resume helpers, cost estimate, output merge logic
test_batch.py request construction, custom_id round-trip, no-corpus path

scripts/build_submission.py

python scripts/build_submission.py [output_path]

Builds a clean zip of code/ (no .venv, __pycache__, logs/, .env, dotfiles, etc.) ready for HackerRank upload. Defaults to /tmp/orchestrate-code.zip. Stdlib-only (no system zip dependency).