Self-Improvement & Learning Loops

An agent that can learn from its own mistakes is powerful. An agent that learns without guardrails produces automation managing automation — dozens of scripts that nobody uses, self-referential improvement loops, and workspace bloat. This section covers how to build learning loops that actually improve the agent, with human-in-the-loop promotion to prevent runaway self-modification.

Key Problems

Unsupervised improvement produces churn

Without human guidance on what to improve, the agent optimizes for volume of changes rather than quality of outcomes. In practice, unchecked self-improvement loops have produced 30+ scripts that needed to be gutted in a single remediation session. The lesson: iteration needs direction.

The lesson promotion pipeline needs measurement

The pipeline exists — daily note → lesson candidate → pending rule → approved rule — but there's no data on how well it works. How many candidates get promoted? How many promoted rules actually change behavior? Is the pipeline just bureaucracy, or does it produce real improvement?

Document-before-acknowledge is the hardest rule

"Write the lesson FIRST, then reply to the conversation" is the standing order. In practice, it's the most frequently violated rule. The conversational pull to respond immediately is strong, and the documentation step feels like friction. Measuring and improving compliance on this one rule would validate the entire learning loop approach.

Auto-evolve vs. propose-and-review

Some changes are safe to auto-apply (updating a daily note, adding a tag). Others need human review (modifying AGENTS.md standing orders, changing cron schedules). The boundary between "safe to self-modify" and "needs approval" is the core design decision of any self-improvement system.

Tracks

Learning loop architectures — pipeline designs, feedback mechanisms, promotion criteria
Lesson extraction patterns — how to identify actionable lessons from operational noise
Guardrails against runaway self-modification — what to auto-apply, what to gate, how to detect churn

Status

Framework needed. The lesson pipeline exists as a concept but hasn't been measured or validated. This section needs a measurement framework before optimization work can begin.

Self-Improvement & Learning Loops ​

Key Problems ​

Unsupervised improvement produces churn ​

The lesson promotion pipeline needs measurement ​

Document-before-acknowledge is the hardest rule ​

Auto-evolve vs. propose-and-review ​