Model Optimization
Matching model capability to task complexity is the single biggest lever for controlling API cost without sacrificing quality. This section covers model selection strategies, thinking parameter tuning, and fallback chain design.
Key Principles
Match model to cognitive demand
Not every task needs the most capable model. The cognitive demand of a task — not its importance — should drive model selection.
| Cognitive demand | Example tasks | Recommended tier |
|---|---|---|
| Low | Email polling, heartbeat checks, dispatch logic, CI fixes | Sonnet, Haiku |
| Medium | Code generation from spec, data synthesis, routine monitoring | Sonnet, Codex |
| High | Architecture design, governance analysis, nuanced judgment calls | Opus |
| Mixed | Scout/dispatch — cheap scan, expensive action | Sonnet scout → Opus actor |
Thinking parameter selection
The thinking parameter controls how much reasoning the model does before responding. Higher thinking produces better results for complex tasks but increases latency and cost.
thinking: high— architecture and design tasks, complex debugging, multi-step planningthinking: low— routine dispatch, monitoring, simple decision-making- Omit entirely for codex models — thinking parameters can cause unexpected behavior with some model families
- Test combinations in isolation — some model + thinking level combinations cause silent hangs in isolated sessions. Validate new combinations before deploying to production crons.
Scout → dispatch as model optimization
The scout/dispatch pattern is fundamentally a model optimization strategy. A cheap scout model handles the common case (no signal), and the expensive model only runs when there's real work. At a 20% hit rate, this cuts effective expensive model usage by 80%.
Fallback Chains
Per-agent fallback configuration
Each agent needs its own model fallback list. A model specified in a cron or spawn must exist in that agent's model.fallbacks config — not just in agents.defaults. An unregistered model causes a silent timeout with no error message.
Codex models: implementation only
Codex models excel at focused code generation from a written specification. They should not be used for:
- Tool-use agents — codex models hallucinate tool execution, reporting completed actions that never happened
- Judgment-heavy tasks — governance analysis, design review, nuanced decision-making
- Interactive agents — conversation requires reasoning about context that codex optimizes away
The pattern: Opus designs, Codex implements, Opus reviews.
Cost Measurement
Model cost is driven by three factors:
- Call volume — how often the model runs (cron cadence × hit rate)
- Input tokens — bootstrap context + prompt + conversation history
- Output tokens — response length (thinking tokens count here too)
Optimizing any one factor in isolation can increase the others. Compacting bootstrap context (factor 2) doesn't help if you're making 48 opus calls/day for a monitoring cron that finds nothing 80% of the time (factor 1).
Status
Principles validated in production. Model selection and thinking optimization patterns are in active use. Cost measurement frameworks and benchmarking patterns need documentation.