Skip to content

Model Optimization

Matching model capability to task complexity is the single biggest lever for controlling API cost without sacrificing quality. This section covers model selection strategies, thinking parameter tuning, and fallback chain design.

Key Principles

Match model to cognitive demand

Not every task needs the most capable model. The cognitive demand of a task — not its importance — should drive model selection.

Cognitive demandExample tasksRecommended tier
LowEmail polling, heartbeat checks, dispatch logic, CI fixesSonnet, Haiku
MediumCode generation from spec, data synthesis, routine monitoringSonnet, Codex
HighArchitecture design, governance analysis, nuanced judgment callsOpus
MixedScout/dispatch — cheap scan, expensive actionSonnet scout → Opus actor

Thinking parameter selection

The thinking parameter controls how much reasoning the model does before responding. Higher thinking produces better results for complex tasks but increases latency and cost.

  • thinking: high — architecture and design tasks, complex debugging, multi-step planning
  • thinking: low — routine dispatch, monitoring, simple decision-making
  • Omit entirely for codex models — thinking parameters can cause unexpected behavior with some model families
  • Test combinations in isolation — some model + thinking level combinations cause silent hangs in isolated sessions. Validate new combinations before deploying to production crons.

Scout → dispatch as model optimization

The scout/dispatch pattern is fundamentally a model optimization strategy. A cheap scout model handles the common case (no signal), and the expensive model only runs when there's real work. At a 20% hit rate, this cuts effective expensive model usage by 80%.

Fallback Chains

Per-agent fallback configuration

Each agent needs its own model fallback list. A model specified in a cron or spawn must exist in that agent's model.fallbacks config — not just in agents.defaults. An unregistered model causes a silent timeout with no error message.

Codex models: implementation only

Codex models excel at focused code generation from a written specification. They should not be used for:

  • Tool-use agents — codex models hallucinate tool execution, reporting completed actions that never happened
  • Judgment-heavy tasks — governance analysis, design review, nuanced decision-making
  • Interactive agents — conversation requires reasoning about context that codex optimizes away

The pattern: Opus designs, Codex implements, Opus reviews.

Cost Measurement

Model cost is driven by three factors:

  1. Call volume — how often the model runs (cron cadence × hit rate)
  2. Input tokens — bootstrap context + prompt + conversation history
  3. Output tokens — response length (thinking tokens count here too)

Optimizing any one factor in isolation can increase the others. Compacting bootstrap context (factor 2) doesn't help if you're making 48 opus calls/day for a monitoring cron that finds nothing 80% of the time (factor 1).

Status

Principles validated in production. Model selection and thinking optimization patterns are in active use. Cost measurement frameworks and benchmarking patterns need documentation.

Built with OpenClaw 🤖