Chained One-Shot Crons
Ephemeral cron chains that self-assemble at runtime — a scout detects a condition, spawns a one-shot action job, which spawns a verify job, which either closes the chain or spawns a retry. The entire chain self-terminates when the work is done or limits are hit.
This sits between recurring crons (fixed cadence, always-on) and sentinel watchers (event-driven, external trigger). Use it when the trigger is detectable by polling but the response needs adaptive, multi-step follow-up that would be wasteful as a permanent recurring job.
When to Use
One-shot chains are a good fit when:
- The trigger is bursty or incident-like (not continuous)
- Follow-up timing depends on runtime state (not fixed intervals)
- The workflow needs verify → retry → escalate stages
- You want automatic decay after resolution (no leftover crons)
- Fixed-cadence polling would be noisy or expensive
Use a recurring cron instead when:
- Work is predictable and always needed (hourly summary, daily briefing)
- Cadence is fixed with no branching logic
- There's no verify/retry requirement
Use a sentinel watcher instead when:
- A clean external event/push trigger exists
- You can match conditions on response payloads without polling
Architecture
┌─────────────────────┐
│ Scout (recurring) │ Cheap model, runs every N minutes
│ Detects trigger │ Spawns chain only when needed
└────────┬────────────┘
│ trigger detected
▼
┌─────────────────────┐
│ Action (one-shot) │ Executes the response (bid, fix, switch)
│ `at` scheduled │ Schedules verify step
└────────┬────────────┘
│ +2-8 min
▼
┌─────────────────────┐
│ Verify (one-shot) │ Confirms action achieved target state
│ `at` scheduled │ Branches: close / monitor / retry
└────────┬────────────┘
│
┌────┴────┐
▼ ▼
[close] [retry/monitor]
schedules another
one-shot at adaptive
intervalEvery job in the chain is an at-scheduled one-shot (schedule.kind: "at"). No permanent crons are created — the chain assembles and disassembles itself.
Chain Metadata Contract
Every spawned one-shot carries metadata in its payload text:
| Field | Purpose |
|---|---|
chainId | Stable identifier for this incident/workflow instance |
chainClass | Category (e.g., auction-defense, ci-remediation) |
depth | Current step depth in the chain (0 = scout, 1 = first action, ...) |
attempt | Retry count for this step |
expiresAt | Hard TTL — chain auto-closes after this time |
maxDepth | Safety limit (default: 4) |
maxChildren | Max total one-shots in the chain (default: 8) |
This metadata enables deduplication, observability, and safety enforcement.
Safety Guardrails
Chains without limits are runaway cron bombs. These are non-negotiable:
Hard Limits
- Max depth: 4 steps (scout → action → verify → retry/escalate)
- Max children per chain: 8 one-shot jobs total
- Max retries per step: 3
- Max chain lifetime: 90 minutes (configurable per domain)
- Max concurrent chains per class: 3
Deduplication
Before spawning a child step, compute a dedupe key:
dedupeKey = chainId + stage + attemptWindowIf an active or pending job exists with the same key, don't spawn a duplicate. This prevents the most common failure mode: scout fires twice before the first chain step runs, creating parallel chains for the same incident.
Circuit Breaker
Open the breaker when:
- ≥3 consecutive chain failures within 30 minutes for the same chain class
- Any invariant violation (depth exceeded, spawn storm detected)
Breaker action:
- Pause the chain class for a cooldown window (default: 60 minutes)
- Emit alert with failure signature, last 3 run IDs, and next safe resume time
- Do not spawn more children — terminate the chain
Lifecycle States
idle → armed → active → cooling_down → closed| Transition | Trigger |
|---|---|
idle → armed | Scout detects condition, creates first one-shot |
armed → active | First chain step begins executing |
active → cooling_down | Success achieved, but monitoring taper remains |
cooling_down → closed | No further risk detected after taper period |
active → closed | Terminal success or terminal failure/escalation |
Prompt Patterns
Scout Prompt (persistent recurring cron)
The scout is cheap — it runs frequently and does almost nothing most of the time. Its only job is deciding whether to spawn a chain.
Autonomous Scout — [Domain] Trigger
1) Read current state for [monitored condition].
2) If no threat/trigger (criteria not met), exit without spawning.
3) If trigger active, create one-shot job at +3m: `[chain-class]-action`.
4) Set chain metadata: chainId=[class]-{date}-{slot}, depth=1,
attempt=1, expiresAt=[window end + 15m], maxChildren=6.
5) Before creating child, dedupe: do not create if active child
exists with same chainId and stage.
6) Post concise status to [monitoring channel] only if chain
newly armed.Action Prompt (one-shot child)
Chain Step — [Domain] Action
Context: chainId={{chainId}} depth={{depth}} attempt={{attempt}}
expiresAt={{expiresAt}}.
1) Re-read current truth from source (onchain, API, etc.).
2) If condition already resolved: close chain, exit.
3) If action needed and within policy caps: execute once.
4) Schedule verify step at +2m.
5) If action failed transiently: schedule retry with backoff
(+2m, +5m, +12m) and increment attempt.
6) Never exceed caps or retries. If limit reached: escalate
and close chain.Verify Prompt (one-shot child)
Chain Step — Verify & Decide
1) Confirm whether last action achieved target state.
2) If success and risk window closed: close chain (state=closed).
3) If success but window still active: schedule monitor one-shot
at adaptive interval:
- 15m if far from deadline
- 5m if mid-window
- 1m if final 10m
4) If failed and attempts remain: schedule bounded retry.
5) If repeated failures >= 3: open circuit breaker and alert.Domain Examples
Governance / Auction Defense
Scout watches for outbid risk within a time window. Action places a defensive bid. Verify confirms lead. Adaptive reschedule based on time-to-close. Auto-close after window + grace period.
CI Remediation
Scout detects failed checks on an active PR. Action dispatches a fix agent. Verify checks CI status after delay. Retry on transient failures only. Escalate if persistent.
Data Divergence Response
Scout detects indexer-vs-onchain divergence. Action flips source-of-truth mode. Verify checks convergence. Close when divergence is below threshold for N consecutive checks.
Key Insight: Temporary Autonomy Bursts
The fundamental insight is that not all autonomous work should be permanent. Recurring crons are always-on infrastructure. One-shot chains are temporary autonomy bursts — they spin up when needed, do their work with built-in retry and verification, then vanish.
This prevents the most common cron anti-pattern: creating a 15-minute recurring job for something that only matters during a 2-hour window, then forgetting to remove it after the window closes.
Related
- Scout / Dispatch — the two-stage precursor to full chains
- Multi-Stage Pipelines — permanent multi-stage workflows (vs ephemeral chains)
- Fire-and-Forget — single dispatch without verify/retry stages
Origin
This pattern was designed and validated in production. The full spec includes assumption validation, change matrix, and operating runbook.