Self-Improvement & Learning Loops
An agent that learns from its own mistakes is powerful. An agent that learns without guardrails produces automation managing automation — dozens of scripts nobody uses, self-referential improvement loops, and workspace bloat. This section documents the learning cycle validated in production, including the failures that shaped it.
The Learning Cycle
Agents follow a four-stage pipeline that turns operational mistakes into durable behavioral rules. Each stage has a different owner and different criteria for promotion.
Daily Operation → Lesson Capture → Promotion Queue → Approved Rule
(automatic) (agent writes) (agent proposes) (human reviews)| Stage | What Happens | Owner | Output |
|---|---|---|---|
| 1. Daily Note Capture | Agent logs events, successes, and mistakes in structured daily notes | Agent (automatic) | memory/YYYY-MM-DD.md |
| 2. Document-Before-Acknowledge | Mistakes are written to file before the agent replies | Agent (enforced) | Daily note entry |
| 3. Lesson Promotion | Recurring patterns get proposed as permanent rules | Agent (proposes) | memory/pending-rules.md |
| 4. Human Review Gate | Operator approves, rejects, or modifies proposed rules | Human (decides) | AGENTS.md / SOUL.md updates |
Supporting systems:
- Automated Lesson Extractor — Cron that scans session transcripts for undocumented mistakes
- Post-Mortems & Playbooks — Structured failure analysis and proven-pattern library
What Operators Learned Building This
Unsupervised improvement produces churn
In one production setup, the first self-improvement loop ran without human guidance. It produced 30+ scripts — automation managing automation, self-referential improvement loops, workspace bloat. A 4-phase remediation gutted 32 scripts in one session. The lesson: iteration needs direction. The agent should propose improvements, not unilaterally implement them.
The promotion threshold matters
Early on, every mistake became a candidate and every candidate felt urgent. Two filters fixed this:
- Frequency threshold: A candidate needs 2+ appearances across different days before promotion (unless it's a high-severity
[RULE]with immediate impact) - Severity tags:
[HEURISTIC](soft guidance) rarely needs AGENTS.md treatment.[RULE](hard constraint) usually does.[PATTERN](methodology) goes to SOUL.md or playbooks
The hardest rule to follow is the most important one
Document-before-acknowledge has been violated more than any other standing order. The instinct to respond first is deeply embedded in how language models work — they're optimized for conversation, not documentation. Mechanical enforcement (trigger-phrase detection, tool-call ordering) works better than appeals to discipline.
Memory search makes the pipeline viable
Without semantic search over daily notes and post-mortems, the pipeline is write-only — agents capture lessons but can't find them later. Memory search closes the loop: a mistake today surfaces the post-mortem from last week, and the agent can check whether a solution already exists before debugging from scratch.
Production Status
Running in production. The lesson promotion pipeline is active with daily note capture, pending-rules review queue, and human approval gate. Document-before-acknowledge is enforced via standing orders in AGENTS.md. Post-mortems are written for significant failures and searchable via memory search.