Chained One-Shot Crons

Ephemeral cron chains that self-assemble at runtime — a scout detects a condition, spawns a one-shot action job, which spawns a verify job, which either closes the chain or spawns a retry. The entire chain self-terminates when the work is done or limits are hit.

This sits between recurring crons (fixed cadence, always-on) and sentinel watchers (event-driven, external trigger). Use it when the trigger is detectable by polling but the response needs adaptive, multi-step follow-up that would be wasteful as a permanent recurring job.

When to Use

One-shot chains are a good fit when:

The trigger is bursty or incident-like (not continuous)
Follow-up timing depends on runtime state (not fixed intervals)
The workflow needs verify → retry → escalate stages
You want automatic decay after resolution (no leftover crons)
Fixed-cadence polling would be noisy or expensive

Use a recurring cron instead when:

Work is predictable and always needed (hourly summary, daily briefing)
Cadence is fixed with no branching logic
There's no verify/retry requirement

Use a sentinel watcher instead when:

A clean external event/push trigger exists
You can match conditions on response payloads without polling

Architecture

Every job in the chain is an at-scheduled one-shot (schedule.kind: "at"). No permanent crons are created — the chain assembles and disassembles itself.

Chain Metadata Contract

Every spawned one-shot carries metadata in its payload text:

Field	Purpose
`chainId`	Stable identifier for this incident/workflow instance
`chainClass`	Category (e.g., `auction-defense`, `ci-remediation`)
`depth`	Current step depth in the chain (0 = scout, 1 = first action, ...)
`attempt`	Retry count for this step
`expiresAt`	Hard TTL — chain auto-closes after this time
`maxDepth`	Safety limit (default: 4)
`maxChildren`	Max total one-shots in the chain (default: 8)

This metadata enables deduplication, observability, and safety enforcement.

Safety Guardrails

Chains without limits are runaway cron bombs. These are non-negotiable:

Hard Limits

Max depth: 4 steps (scout → action → verify → retry/escalate)
Max children per chain: 8 one-shot jobs total
Max retries per step: 3
Max chain lifetime: 90 minutes (configurable per domain)
Max concurrent chains per class: 3

Deduplication

Before spawning a child step, compute a dedupe key:

dedupeKey = chainId + stage + attemptWindow

If an active or pending job exists with the same key, don't spawn a duplicate. This prevents the most common failure mode: scout fires twice before the first chain step runs, creating parallel chains for the same incident.

Circuit Breaker

Open the breaker when:

≥3 consecutive chain failures within 30 minutes for the same chain class
Any invariant violation (depth exceeded, spawn storm detected)

Breaker action:

Pause the chain class for a cooldown window (default: 60 minutes)
Emit alert with failure signature, last 3 run IDs, and next safe resume time
Do not spawn more children — terminate the chain

Lifecycle States

Transition	Trigger
`idle → armed`	Scout detects condition, creates first one-shot
`armed → active`	First chain step begins executing
`active → cooling_down`	Success achieved, but monitoring taper remains
`cooling_down → closed`	No further risk detected after taper period
`active → closed`	Terminal success or terminal failure/escalation

Prompt Patterns

Scout Prompt (persistent recurring cron)

The scout is cheap — it runs frequently and does almost nothing most of the time. Its only job is deciding whether to spawn a chain.

Autonomous Scout — [Domain] Trigger
1) Read current state for [monitored condition].
2) If no threat/trigger (criteria not met), exit without spawning.
3) If trigger active, create one-shot job at +3m: `[chain-class]-action`.
4) Set chain metadata: chainId=[class]-{date}-{slot}, depth=1,
   attempt=1, expiresAt=[window end + 15m], maxChildren=6.
5) Before creating child, dedupe: do not create if active child
   exists with same chainId and stage.
6) Post concise status to [monitoring channel] only if chain
   newly armed.

Action Prompt (one-shot child)

Chain Step — [Domain] Action
Context: chainId={{chainId}} depth={{depth}} attempt={{attempt}}
         expiresAt={{expiresAt}}.
1) Re-read current truth from source (onchain, API, etc.).
2) If condition already resolved: close chain, exit.
3) If action needed and within policy caps: execute once.
4) Schedule verify step at +2m.
5) If action failed transiently: schedule retry with backoff
   (+2m, +5m, +12m) and increment attempt.
6) Never exceed caps or retries. If limit reached: escalate
   and close chain.

Verify Prompt (one-shot child)

Chain Step — Verify & Decide
1) Confirm whether last action achieved target state.
2) If success and risk window closed: close chain (state=closed).
3) If success but window still active: schedule monitor one-shot
   at adaptive interval:
   - 15m if far from deadline
   - 5m if mid-window
   - 1m if final 10m
4) If failed and attempts remain: schedule bounded retry.
5) If repeated failures >= 3: open circuit breaker and alert.

Domain Examples

Governance / Auction Defense

Scout watches for outbid risk within a time window. Action places a defensive bid. Verify confirms lead. Adaptive reschedule based on time-to-close. Auto-close after window + grace period.

CI Remediation

Scout detects failed checks on an active PR. Action dispatches a fix agent. Verify checks CI status after delay. Retry on transient failures only. Escalate if persistent.

Data Divergence Response

Scout detects indexer-vs-onchain divergence. Action flips source-of-truth mode. Verify checks convergence. Close when divergence is below threshold for N consecutive checks.

Key Insight: Temporary Autonomy Bursts

The fundamental insight is that not all autonomous work should be permanent. Recurring crons are always-on infrastructure. One-shot chains are temporary autonomy bursts — they spin up when needed, do their work with built-in retry and verification, then vanish.

This prevents the most common cron anti-pattern: creating a 15-minute recurring job for something that only matters during a 2-hour window, then forgetting to remove it after the window closes.

Scout / Dispatch — the two-stage precursor to full chains
Multi-Stage Pipelines — permanent multi-stage workflows (vs ephemeral chains)
Fire-and-Forget — single dispatch without verify/retry stages

Origin

This pattern was designed and validated in production. The full spec includes assumption validation, change matrix, and operating runbook.

Chained One-Shot Crons ​

When to Use ​

Architecture ​

Chain Metadata Contract ​

Safety Guardrails ​

Hard Limits ​

Deduplication ​

Circuit Breaker ​

Lifecycle States ​

Prompt Patterns ​

Scout Prompt (persistent recurring cron) ​

Action Prompt (one-shot child) ​

Verify Prompt (one-shot child) ​

Domain Examples ​

Governance / Auction Defense ​

CI Remediation ​

Data Divergence Response ​

Key Insight: Temporary Autonomy Bursts ​

Related ​