Getting Started
This guide distills lessons from running OpenClaw agents in production — daily crons, multi-agent coordination, monitoring, automation, and everything in between. These aren't hypotheticals; they're patterns that survived contact with reality.
The Foundation
1. Give Your Agent a Brain Before Giving It Tasks
An unconfigured agent is a chatbot with tool access. It has no memory of who you are, no operational rules, and no judgment about what's appropriate. The workspace files are what turn it into something useful.
Start with these four files:
- AGENTS.md — operational rules, standing orders, safety constraints. This is where you encode "never do X," "always check Y before Z," and "ask me before doing anything irreversible." Be specific. Vague rules get vague compliance.
- SOUL.md — personality, decision framework, communication style. Defines how the agent thinks, not just what it does. Include error recovery patterns and autonomy tiers (what it can do alone vs. what needs approval).
- USER.md — context about you. Timezone, communication preferences, key accounts. The agent reads this every session so it doesn't have to re-learn basics.
- TOOLS.md — environment-specific notes. Credential locations, API endpoints, device names, infrastructure details. Keeps operational knowledge separate from behavioral rules.
The most important guardrails to add early:
- Anti-looping: "if you've attempted the same action twice with the same approach, stop and report"
- Verification: "check the result before reporting success"
- Escalation: "3 consecutive failures on the same task → alert me"
- File-first memory: "if you want to remember something, write it to a file — mental notes don't survive sessions"
You'll iterate on these files constantly. That's expected. Every failure teaches you a new rule to add.
2. Route Models by Task, Not by Default
Not every task needs your strongest model. This is the single biggest cost lever.
| Task Type | Model Tier | Examples |
|---|---|---|
| Heartbeats, monitoring, routine checks | Cheap (Haiku, Gemini Flash) | Health pings, calendar scans, email checks |
| Standard work, conversations, tool use | Mid-tier (Sonnet, GPT-5.4) | Chat, research, file operations |
| Complex reasoning, architecture, planning | Premium (Opus) | Design docs, multi-step analysis, code review |
Set your default model to the cheapest one that reliably follows instructions. Override to stronger models per-agent or per-cron where reasoning quality actually matters. Operators have seen prompt costs drop from 20-40k tokens to ~1.5k per request by routing smarter.
{
"agents": {
"defaults": {
"model": {
"primary": "openai-codex/gpt-5.4",
"fallbacks": ["anthropic/claude-sonnet-4-6"]
}
},
"list": [
{
"id": "main",
"model": { "primary": "anthropic/claude-opus-4-6" }
}
]
}
}3. Set Up Memory Search Early
Without memory search, your agent wakes up every session with amnesia. It can read files it knows about, but it can't recall — it can't find the lesson it wrote last Tuesday, the decision you made about API design, or the mistake it already learned from.
Memory search gives your agent semantic recall over its own workspace. It's the difference between an agent that repeats the same mistakes and one that genuinely learns across sessions.
Why it compounds:
- Every lesson, decision, and preference becomes findable, not just written down
- Agents self-correct faster because they can check "have I seen this before?"
- The more your agent writes to daily notes and MEMORY.md, the more valuable search becomes
- Daily notes become a searchable knowledge base, not just a log
Setup is ~10 minutes:
- Install Ollama and pull an embedding model (
ollama pull bge-m3) - Add
memorySearchconfig toopenclaw.json - Restart the gateway
Zero API cost, no rate limits, works offline. See the full Memory Search guide for setup, model selection, and tuning.
TIP
bge-m3 (1.2 GB) for quality, nomic-embed-text (274 MB) if RAM is tight. Cloud providers (OpenAI, Gemini, Voyage) also supported if you'd rather not run Ollama.
4. Automation Means Crons, Not Long-Running Sessions
Sessions are stateful only while active. If you ask your agent to work on something and close the chat, it doesn't keep going. There's no background daemon thinking about your problem.
For scheduled work: Create cron jobs with sessionTarget: "isolated". These spin up independent sessions on a schedule, do their work, and optionally report results.
// Morning briefing: runs at 8 AM, sends summary to your chat
{
"schedule": { "kind": "cron", "expr": "0 8 * * *", "tz": "America/New_York" },
"sessionTarget": "isolated",
"payload": {
"kind": "agentTurn",
"message": "Check email, calendar, and pending tasks. Send a morning summary."
},
"delivery": { "mode": "announce" }
}Key insight learned the hard way: a script without a cron is a tool, not automation. In one production setup, an entire task operator was built and never scheduled — hours of work that sat idle until it was wired to a cron in the same session.
5. One Integration at a Time
Every integration (email, calendar, messaging, web scraping) is a separate failure mode with its own configuration, credentials, and edge cases. Trying to set up everything at once leads to debugging five things simultaneously with no working baseline.
The pattern that works:
- Pick one workflow (e.g., a morning briefing that checks email)
- Get it working end-to-end — config, credentials, test run, verify output
- Add a cron so it runs automatically
- Add error handling (what happens when the email server is down?)
- Move to the next integration
Run openclaw doctor --fix when things break. It catches most config issues.
6. Persist What Matters, Delete What Doesn't
Compaction loses detail over time. Long conversations get summarized. Files are how your agent maintains continuity across sessions.
What to persist:
- Decisions and rationale — why you chose X over Y (AGENTS.md or MEMORY.md)
- Credentials and endpoints — in TOOLS.md, secrets in OS keychain
- Operational state — JSON for machine-readable, markdown for human-readable
- Lessons learned — daily notes, promoted to AGENTS.md when proven durable
What to cut:
- Blank template files with no content (they waste tokens every session)
- Duplicate rules across multiple files (pick one canonical location)
- Transient data that changes every run
- Stale state files that no longer reflect reality (rebuild from source, don't patch)
A common pattern is a workspace backup cron that syncs critical files hourly. The backup is silent when nothing changes and sends an alert on failure. That's the model — infrastructure should be invisible until it breaks.
7. Test Models on Real Agent Tasks
Chat benchmarks don't predict agent performance. Tool-calling reliability, instruction following, and structured output matter more than raw reasoning scores.
What to test:
- Can it make well-formed tool calls consistently?
- Does it follow multi-step instructions without skipping steps?
- Does it verify results before claiming success?
- Does it handle errors gracefully or spiral?
Operators have seen models that benchmark beautifully but hallucinate tool execution — claiming to have run commands and produced files that don't exist. Test on your actual workflows before committing.
8. Expect Iteration
The gap between a first conversation and reliable daily autonomous operation is real. It closes with each release, but it's still measured in weeks, not hours.
Realistic trajectory:
- Week 1: Basic setup, first working conversation, initial workspace files, first "why did it do that?" moment
- Week 2: First cron job, one integration working end-to-end, initial guardrails from mistakes
- Week 3-4: Multiple integrations, reliable crons, agent personality solidifying, learning from daily notes
- Month 2+: Autonomous operation within bounded domains, self-improvement loops, the agent starts catching things before you do
Every failure is a rule you didn't write yet. Every repeated mistake is a guardrail gap. The workspace files are a living document — they get better because things go wrong.
What's Next
Once the basics are working:
- Workspace Files & Bootstrap — structure your files and optimize what gets injected
- Memory Search — semantic recall setup and tuning
- Model Optimization — advanced routing and cost strategies
- Scheduling & Automation — crons, sentinels, and native schedulers
- Agent Architecture — multi-agent identity, routing, and coordination
- Agentic Patterns — scout/dispatch, research fleets, pipelines