Skip to content

Chained One-Shot Crons

Ephemeral cron chains that self-assemble at runtime — a scout detects a condition, spawns a one-shot action job, which spawns a verify job, which either closes the chain or spawns a retry. The entire chain self-terminates when the work is done or limits are hit.

This sits between recurring crons (fixed cadence, always-on) and sentinel watchers (event-driven, external trigger). Use it when the trigger is detectable by polling but the response needs adaptive, multi-step follow-up that would be wasteful as a permanent recurring job.

When to Use

One-shot chains are a good fit when:

  • The trigger is bursty or incident-like (not continuous)
  • Follow-up timing depends on runtime state (not fixed intervals)
  • The workflow needs verify → retry → escalate stages
  • You want automatic decay after resolution (no leftover crons)
  • Fixed-cadence polling would be noisy or expensive

Use a recurring cron instead when:

  • Work is predictable and always needed (hourly summary, daily briefing)
  • Cadence is fixed with no branching logic
  • There's no verify/retry requirement

Use a sentinel watcher instead when:

  • A clean external event/push trigger exists
  • You can match conditions on response payloads without polling

Architecture

┌─────────────────────┐
│  Scout (recurring)   │  Cheap model, runs every N minutes
│  Detects trigger     │  Spawns chain only when needed
└────────┬────────────┘
         │ trigger detected

┌─────────────────────┐
│  Action (one-shot)   │  Executes the response (bid, fix, switch)
│  `at` scheduled      │  Schedules verify step
└────────┬────────────┘
         │ +2-8 min

┌─────────────────────┐
│  Verify (one-shot)   │  Confirms action achieved target state
│  `at` scheduled      │  Branches: close / monitor / retry
└────────┬────────────┘

    ┌────┴────┐
    ▼         ▼
 [close]   [retry/monitor]
            schedules another
            one-shot at adaptive
            interval

Every job in the chain is an at-scheduled one-shot (schedule.kind: "at"). No permanent crons are created — the chain assembles and disassembles itself.

Chain Metadata Contract

Every spawned one-shot carries metadata in its payload text:

FieldPurpose
chainIdStable identifier for this incident/workflow instance
chainClassCategory (e.g., auction-defense, ci-remediation)
depthCurrent step depth in the chain (0 = scout, 1 = first action, ...)
attemptRetry count for this step
expiresAtHard TTL — chain auto-closes after this time
maxDepthSafety limit (default: 4)
maxChildrenMax total one-shots in the chain (default: 8)

This metadata enables deduplication, observability, and safety enforcement.

Safety Guardrails

Chains without limits are runaway cron bombs. These are non-negotiable:

Hard Limits

  • Max depth: 4 steps (scout → action → verify → retry/escalate)
  • Max children per chain: 8 one-shot jobs total
  • Max retries per step: 3
  • Max chain lifetime: 90 minutes (configurable per domain)
  • Max concurrent chains per class: 3

Deduplication

Before spawning a child step, compute a dedupe key:

dedupeKey = chainId + stage + attemptWindow

If an active or pending job exists with the same key, don't spawn a duplicate. This prevents the most common failure mode: scout fires twice before the first chain step runs, creating parallel chains for the same incident.

Circuit Breaker

Open the breaker when:

  • ≥3 consecutive chain failures within 30 minutes for the same chain class
  • Any invariant violation (depth exceeded, spawn storm detected)

Breaker action:

  1. Pause the chain class for a cooldown window (default: 60 minutes)
  2. Emit alert with failure signature, last 3 run IDs, and next safe resume time
  3. Do not spawn more children — terminate the chain

Lifecycle States

idle → armed → active → cooling_down → closed
TransitionTrigger
idle → armedScout detects condition, creates first one-shot
armed → activeFirst chain step begins executing
active → cooling_downSuccess achieved, but monitoring taper remains
cooling_down → closedNo further risk detected after taper period
active → closedTerminal success or terminal failure/escalation

Prompt Patterns

Scout Prompt (persistent recurring cron)

The scout is cheap — it runs frequently and does almost nothing most of the time. Its only job is deciding whether to spawn a chain.

Autonomous Scout — [Domain] Trigger
1) Read current state for [monitored condition].
2) If no threat/trigger (criteria not met), exit without spawning.
3) If trigger active, create one-shot job at +3m: `[chain-class]-action`.
4) Set chain metadata: chainId=[class]-{date}-{slot}, depth=1,
   attempt=1, expiresAt=[window end + 15m], maxChildren=6.
5) Before creating child, dedupe: do not create if active child
   exists with same chainId and stage.
6) Post concise status to [monitoring channel] only if chain
   newly armed.

Action Prompt (one-shot child)

Chain Step — [Domain] Action
Context: chainId={{chainId}} depth={{depth}} attempt={{attempt}}
         expiresAt={{expiresAt}}.
1) Re-read current truth from source (onchain, API, etc.).
2) If condition already resolved: close chain, exit.
3) If action needed and within policy caps: execute once.
4) Schedule verify step at +2m.
5) If action failed transiently: schedule retry with backoff
   (+2m, +5m, +12m) and increment attempt.
6) Never exceed caps or retries. If limit reached: escalate
   and close chain.

Verify Prompt (one-shot child)

Chain Step — Verify & Decide
1) Confirm whether last action achieved target state.
2) If success and risk window closed: close chain (state=closed).
3) If success but window still active: schedule monitor one-shot
   at adaptive interval:
   - 15m if far from deadline
   - 5m if mid-window
   - 1m if final 10m
4) If failed and attempts remain: schedule bounded retry.
5) If repeated failures >= 3: open circuit breaker and alert.

Domain Examples

Governance / Auction Defense

Scout watches for outbid risk within a time window. Action places a defensive bid. Verify confirms lead. Adaptive reschedule based on time-to-close. Auto-close after window + grace period.

CI Remediation

Scout detects failed checks on an active PR. Action dispatches a fix agent. Verify checks CI status after delay. Retry on transient failures only. Escalate if persistent.

Data Divergence Response

Scout detects indexer-vs-onchain divergence. Action flips source-of-truth mode. Verify checks convergence. Close when divergence is below threshold for N consecutive checks.

Key Insight: Temporary Autonomy Bursts

The fundamental insight is that not all autonomous work should be permanent. Recurring crons are always-on infrastructure. One-shot chains are temporary autonomy bursts — they spin up when needed, do their work with built-in retry and verification, then vanish.

This prevents the most common cron anti-pattern: creating a 15-minute recurring job for something that only matters during a 2-hour window, then forgetting to remove it after the window closes.

Origin

This pattern was designed and validated in production. The full spec includes assumption validation, change matrix, and operating runbook.

Built with OpenClaw 🤖