Skip to content

Chained One-Shot Crons

Ephemeral cron chains that self-assemble at runtime — a scout detects a condition, spawns a one-shot action job, which spawns a verify job, which either closes the chain or spawns a retry. The entire chain self-terminates when the work is done or limits are hit.

This sits between recurring crons (fixed cadence, always-on) and sentinel watchers (event-driven, external trigger). Use it when the trigger is detectable by polling but the response needs adaptive, multi-step follow-up that would be wasteful as a permanent recurring job.

When to Use

One-shot chains are a good fit when:

  • The trigger is bursty or incident-like (not continuous)
  • Follow-up timing depends on runtime state (not fixed intervals)
  • The workflow needs verify → retry → escalate stages
  • You want automatic decay after resolution (no leftover crons)
  • Fixed-cadence polling would be noisy or expensive

Use a recurring cron instead when:

  • Work is predictable and always needed (hourly summary, daily briefing)
  • Cadence is fixed with no branching logic
  • There's no verify/retry requirement

Use a sentinel watcher instead when:

  • A clean external event/push trigger exists
  • You can match conditions on response payloads without polling

Architecture

┌─────────────────────┐
│  Scout (recurring)   │  Cheap model, runs every N minutes
│  Detects trigger     │  Spawns chain only when needed
└────────┬────────────┘
         │ trigger detected

┌─────────────────────┐
│  Action (one-shot)   │  Executes the response (bid, fix, switch)
│  `at` scheduled      │  Schedules verify step
└────────┬────────────┘
         │ +2-8 min

┌─────────────────────┐
│  Verify (one-shot)   │  Confirms action achieved target state
│  `at` scheduled      │  Branches: close / monitor / retry
└────────┬────────────┘

    ┌────┴────┐
    ▼         ▼
 [close]   [retry/monitor]
            schedules another
            one-shot at adaptive
            interval

Every job in the chain is an at-scheduled one-shot (schedule.kind: "at"). No permanent crons are created — the chain assembles and disassembles itself.

Chain Metadata Contract

Every spawned one-shot carries metadata in its payload text:

FieldPurpose
chainIdStable identifier for this incident/workflow instance
chainClassCategory (e.g., auction-defense, ci-remediation)
depthCurrent step depth in the chain (0 = scout, 1 = first action, ...)
attemptRetry count for this step
expiresAtHard TTL — chain auto-closes after this time
maxDepthSafety limit (default: 4)
maxChildrenMax total one-shots in the chain (default: 8)

This metadata enables deduplication, observability, and safety enforcement.

Safety Guardrails

Chains without limits are runaway cron bombs. These are non-negotiable:

Hard Limits

  • Max depth: 4 steps (scout → action → verify → retry/escalate)
  • Max children per chain: 8 one-shot jobs total
  • Max retries per step: 3
  • Max chain lifetime: 90 minutes (configurable per domain)
  • Max concurrent chains per class: 3

Deduplication

Before spawning a child step, compute a dedupe key:

dedupeKey = chainId + stage + attemptWindow

If an active or pending job exists with the same key, don't spawn a duplicate. This prevents the most common failure mode: scout fires twice before the first chain step runs, creating parallel chains for the same incident.

Circuit Breaker

Open the breaker when:

  • ≥3 consecutive chain failures within 30 minutes for the same chain class
  • Any invariant violation (depth exceeded, spawn storm detected)

Breaker action:

  1. Pause the chain class for a cooldown window (default: 60 minutes)
  2. Emit alert with failure signature, last 3 run IDs, and next safe resume time
  3. Do not spawn more children — terminate the chain

Lifecycle States

idle → armed → active → cooling_down → closed
TransitionTrigger
idle → armedScout detects condition, creates first one-shot
armed → activeFirst chain step begins executing
active → cooling_downSuccess achieved, but monitoring taper remains
cooling_down → closedNo further risk detected after taper period
active → closedTerminal success or terminal failure/escalation

Prompt Patterns

Scout Prompt (persistent recurring cron)

The scout is cheap — it runs frequently and does almost nothing most of the time. Its only job is deciding whether to spawn a chain.

Autonomous Scout — [Domain] Trigger
1) Read current state for [monitored condition].
2) If no threat/trigger (criteria not met), exit without spawning.
3) If trigger active, create one-shot job at +3m: `[chain-class]-action`.
4) Set chain metadata: chainId=[class]-{date}-{slot}, depth=1,
   attempt=1, expiresAt=[window end + 15m], maxChildren=6.
5) Before creating child, dedupe: do not create if active child
   exists with same chainId and stage.
6) Post concise status to [monitoring channel] only if chain
   newly armed.

Action Prompt (one-shot child)

Chain Step — [Domain] Action
Context: chainId={{chainId}} depth={{depth}} attempt={{attempt}}
         expiresAt={{expiresAt}}.
1) Re-read current truth from source (onchain, API, etc.).
2) If condition already resolved: close chain, exit.
3) If action needed and within policy caps: execute once.
4) Schedule verify step at +2m.
5) If action failed transiently: schedule retry with backoff
   (+2m, +5m, +12m) and increment attempt.
6) Never exceed caps or retries. If limit reached: escalate
   and close chain.

Verify Prompt (one-shot child)

Chain Step — Verify & Decide
1) Confirm whether last action achieved target state.
2) If success and risk window closed: close chain (state=closed).
3) If success but window still active: schedule monitor one-shot
   at adaptive interval:
   - 15m if far from deadline
   - 5m if mid-window
   - 1m if final 10m
4) If failed and attempts remain: schedule bounded retry.
5) If repeated failures >= 3: open circuit breaker and alert.

Domain Examples

Governance / Auction Defense

Scout watches for outbid risk within a time window. Action places a defensive bid. Verify confirms lead. Adaptive reschedule based on time-to-close. Auto-close after window + grace period.

CI Remediation

Scout detects failed checks on an active PR. Action dispatches a fix agent. Verify checks CI status after delay. Retry on transient failures only. Escalate if persistent.

Data Divergence Response

Scout detects indexer-vs-onchain divergence. Action flips source-of-truth mode. Verify checks convergence. Close when divergence is below threshold for N consecutive checks.

Key Insight: Temporary Autonomy Bursts

The fundamental insight is that not all autonomous work should be permanent. Recurring crons are always-on infrastructure. One-shot chains are temporary autonomy bursts — they spin up when needed, do their work with built-in retry and verification, then vanish.

This prevents the most common cron anti-pattern: creating a 15-minute recurring job for something that only matters during a 2-hour window, then forgetting to remove it after the window closes.

Origin

This pattern was designed and validated in a dedicated design spec phase (March 2026). The full v0.2 spec with assumption validation, change matrix, and operating runbook lives in the cron-cascade-design-spec repo (private).

Built with OpenClaw 🤖