Cron & Background Job Patterns
Cron jobs are the backbone of autonomous operation — email monitoring, governance scanning, heartbeats, opportunity detection. But there's a big gap between "runs every 15 minutes" and "runs every 15 minutes well." This section covers the patterns that make cron execution reliable, cost-effective, and resilient.
At scale, crons generate thousands of API calls per week. Getting cron patterns right is a cost and quality multiplier across the entire system.
Key Problems
Stateless agents catastrophize transient errors
A cron agent has no memory of prior runs. If an API returns a 500, the agent has no context that this endpoint was fine 15 minutes ago — it may escalate a transient blip as a critical failure, wasting tokens on analysis and false alerts.
Model selection is one-size-fits-all
Not every cron needs Opus. A routine email check that finds nothing is burning premium model tokens. But a task requiring real judgment should use the best available model. Matching model to task type saves money without sacrificing quality.
Circuit breaker protocol gets truncated
The circuit breaker pattern (closed → open → half-open) lives in AGENTS.md's middle section — exactly where truncation hits. Cron agents literally can't see the protocol they're supposed to follow.
Scout vs. monolith tradeoffs are unclear
Should a cron be one monolithic job that checks everything, or a cheap scout that only dispatches an expensive agent when something needs attention? The answer depends on hit rates, and we don't have great data yet.
Tracks
- Cron architecture patterns — monolithic vs. scout/dispatch, isolation strategies
- Failure handling — circuit breakers, retry backoff, transient vs. persistent error detection
- Model routing — matching model capability to job requirements
- Cost/quality tradeoffs — measurement frameworks for cron optimization
Status
Patterns identified, needs synthesis. Real-world cron failures and successes from production operation need to be synthesized into reusable patterns here.