Skip to content

Self-Improvement & Learning Loops

An agent that learns from its own mistakes is powerful. An agent that learns without guardrails produces automation managing automation — dozens of scripts nobody uses, self-referential improvement loops, and workspace bloat. This section documents the learning cycle validated in production, including the failures that shaped it.

The Learning Cycle

Agents follow a four-stage pipeline that turns operational mistakes into durable behavioral rules. Each stage has a different owner and different criteria for promotion.

Daily Operation → Lesson Capture → Promotion Queue → Approved Rule
  (automatic)     (agent writes)    (agent proposes)   (human reviews)
StageWhat HappensOwnerOutput
1. Daily Note CaptureAgent logs events, successes, and mistakes in structured daily notesAgent (automatic)memory/YYYY-MM-DD.md
2. Document-Before-AcknowledgeMistakes are written to file before the agent repliesAgent (enforced)Daily note entry
3. Lesson PromotionRecurring patterns get proposed as permanent rulesAgent (proposes)memory/pending-rules.md
4. Human Review GateOperator approves, rejects, or modifies proposed rulesHuman (decides)AGENTS.md / SOUL.md updates

Supporting systems:

What Operators Learned Building This

Unsupervised improvement produces churn

In one production setup, the first self-improvement loop ran without human guidance. It produced 30+ scripts — automation managing automation, self-referential improvement loops, workspace bloat. A 4-phase remediation gutted 32 scripts in one session. The lesson: iteration needs direction. The agent should propose improvements, not unilaterally implement them.

The promotion threshold matters

Early on, every mistake became a candidate and every candidate felt urgent. Two filters fixed this:

  • Frequency threshold: A candidate needs 2+ appearances across different days before promotion (unless it's a high-severity [RULE] with immediate impact)
  • Severity tags: [HEURISTIC] (soft guidance) rarely needs AGENTS.md treatment. [RULE] (hard constraint) usually does. [PATTERN] (methodology) goes to SOUL.md or playbooks

The hardest rule to follow is the most important one

Document-before-acknowledge has been violated more than any other standing order. The instinct to respond first is deeply embedded in how language models work — they're optimized for conversation, not documentation. Mechanical enforcement (trigger-phrase detection, tool-call ordering) works better than appeals to discipline.

Memory search makes the pipeline viable

Without semantic search over daily notes and post-mortems, the pipeline is write-only — agents capture lessons but can't find them later. Memory search closes the loop: a mistake today surfaces the post-mortem from last week, and the agent can check whether a solution already exists before debugging from scratch.

Production Status

Running in production. The lesson promotion pipeline is active with daily note capture, pending-rules review queue, and human approval gate. Document-before-acknowledge is enforced via standing orders in AGENTS.md. Post-mortems are written for significant failures and searchable via memory search.

Built with OpenClaw 🤖