Self-Improvement & Learning Loops

An agent that learns from its own mistakes is powerful. An agent that learns without guardrails produces automation managing automation — dozens of scripts nobody uses, self-referential improvement loops, and workspace bloat. This section documents the learning cycle validated in production, including the failures that shaped it.

The Learning Cycle

Agents follow a four-stage pipeline that turns operational mistakes into durable behavioral rules. Each stage has a different owner and different criteria for promotion.

Daily Operation → Lesson Capture → Promotion Queue → Approved Rule
  (automatic)     (agent writes)    (agent proposes)   (human reviews)

Stage	What Happens	Owner	Output
1. Daily Note Capture	Agent logs events, successes, and mistakes in structured daily notes	Agent (automatic)	`memory/YYYY-MM-DD.md`
2. Document-Before-Acknowledge	Mistakes are written to file before the agent replies	Agent (enforced)	Daily note entry
3. Lesson Promotion	Recurring patterns get proposed as permanent rules	Agent (proposes)	`memory/pending-rules.md`
4. Human Review Gate	Operator approves, rejects, or modifies proposed rules	Human (decides)	AGENTS.md / SOUL.md updates

Supporting systems:

Automated Lesson Extractor — Cron that scans session transcripts for undocumented mistakes
Post-Mortems & Playbooks — Structured failure analysis and proven-pattern library

What Operators Learned Building This

Unsupervised improvement produces churn

In one production setup, the first self-improvement loop ran without human guidance. It produced 30+ scripts — automation managing automation, self-referential improvement loops, workspace bloat. A 4-phase remediation gutted 32 scripts in one session. The lesson: iteration needs direction. The agent should propose improvements, not unilaterally implement them.

The promotion threshold matters

Early on, every mistake became a candidate and every candidate felt urgent. Two filters fixed this:

Frequency threshold: A candidate needs 2+ appearances across different days before promotion (unless it's a high-severity [RULE] with immediate impact)
Severity tags: [HEURISTIC] (soft guidance) rarely needs AGENTS.md treatment. [RULE] (hard constraint) usually does. [PATTERN] (methodology) goes to SOUL.md or playbooks

The hardest rule to follow is the most important one

Document-before-acknowledge has been violated more than any other standing order. The instinct to respond first is deeply embedded in how language models work — they're optimized for conversation, not documentation. Mechanical enforcement (trigger-phrase detection, tool-call ordering) works better than appeals to discipline.

Memory search makes the pipeline viable

Without semantic search over daily notes and post-mortems, the pipeline is write-only — agents capture lessons but can't find them later. Memory search closes the loop: a mistake today surfaces the post-mortem from last week, and the agent can check whether a solution already exists before debugging from scratch.

Production Status

Running in production. The lesson promotion pipeline is active with daily note capture, pending-rules review queue, and human approval gate. Document-before-acknowledge is enforced via standing orders in AGENTS.md. Post-mortems are written for significant failures and searchable via memory search.

Self-Improvement & Learning Loops ​

The Learning Cycle ​

What Operators Learned Building This ​

Unsupervised improvement produces churn ​

The promotion threshold matters ​

The hardest rule to follow is the most important one ​