Model Optimization

Matching model capability to task complexity is the single biggest lever for controlling API cost without sacrificing quality. This section covers model selection strategies, thinking parameter tuning, and fallback chain design.

Key Principles

Match model to cognitive demand

Not every task needs the most capable model. The cognitive demand of a task — not its importance — should drive model selection.

Cognitive demand	Example tasks	Recommended tier
Low	Email polling, heartbeat checks, dispatch logic, CI fixes	Sonnet, Haiku
Medium	Code generation from spec, data synthesis, routine monitoring	Sonnet, Codex
High	Architecture design, governance analysis, nuanced judgment calls	Opus
Mixed	Scout/dispatch — cheap scan, expensive action	Sonnet scout → Opus actor

Thinking parameter selection

The thinking parameter controls how much reasoning the model does before responding. Higher thinking produces better results for complex tasks but increases latency and cost.

thinking: high — architecture and design tasks, complex debugging, multi-step planning
thinking: low — routine dispatch, monitoring, simple decision-making
Omit entirely for codex models — thinking parameters can cause unexpected behavior with some model families
Test combinations in isolation — some model + thinking level combinations cause silent hangs in isolated sessions. Validate new combinations before deploying to production crons.

Scout → dispatch as model optimization

The scout/dispatch pattern is fundamentally a model optimization strategy. A cheap scout model handles the common case (no signal), and the expensive model only runs when there's real work. At a 20% hit rate, this cuts effective expensive model usage by 80%.

Fallback Chains

Per-agent fallback configuration

Each agent needs its own model fallback list. A model specified in a cron or spawn must exist in that agent's model.fallbacks config — not just in agents.defaults. An unregistered model causes a silent timeout with no error message.

Codex models: implementation only

Codex models excel at focused code generation from a written specification. They should not be used for:

Tool-use agents — codex models hallucinate tool execution, reporting completed actions that never happened
Judgment-heavy tasks — governance analysis, design review, nuanced decision-making
Interactive agents — conversation requires reasoning about context that codex optimizes away

The pattern: Opus designs, Codex implements, Opus reviews.

Cost Measurement

Model cost is driven by three factors:

Call volume — how often the model runs (cron cadence × hit rate)
Input tokens — bootstrap context + prompt + conversation history
Output tokens — response length (thinking tokens count here too)

Optimizing any one factor in isolation can increase the others. Compacting bootstrap context (factor 2) doesn't help if you're making 48 opus calls/day for a monitoring cron that finds nothing 80% of the time (factor 1).

Status

Principles validated in production. Model selection and thinking optimization patterns are in active use. Cost measurement frameworks and benchmarking patterns need documentation.

Model Optimization ​

Key Principles ​

Match model to cognitive demand ​

Thinking parameter selection ​

Scout → dispatch as model optimization ​

Fallback Chains ​

Per-agent fallback configuration ​

Codex models: implementation only ​

Cost Measurement ​