Memory Search
OpenClaw agents wake up fresh each session with no memory of prior work. The memory search system bridges this gap — semantic search over workspace files using local embeddings combined with full-text search, enabling agents to recall prior decisions, lessons, and context.
Getting memory search right means the difference between an agent that repeats mistakes and one that learns from them.
Architecture
Hybrid query pipeline
Memory search uses a two-signal hybrid approach:
- Vector search — local embeddings (e.g., Ollama with
nomic-embed-text) produce semantic similarity scores - Full-text search (FTS) — SQLite FTS provides keyword matching for exact terms the embedding might miss
Results are blended with configurable weighting (e.g., 70% vector + 30% text), then re-ranked using MMR (Maximal Marginal Relevance) to reduce redundancy.
Temporal decay
A configurable half-life (e.g., 30 days) ensures recent context ranks higher than semantically similar but stale entries. Without temporal decay, a lesson from 3 months ago can outrank a relevant decision from yesterday.
Configuration
{
"agents": {
"defaults": {
"memorySearch": {
"embeddings": {
"provider": "ollama",
"model": "nomic-embed-text",
"endpoint": "http://127.0.0.1:11434"
},
"hybrid": {
"vectorWeight": 0.7,
"textWeight": 0.3,
"mmrLambda": 0.7,
"temporalDecayHalfLifeDays": 30
},
"fallback": "none"
}
}
}
}Key Design Points
Local embeddings eliminate API cost and latency
Running embeddings locally (Ollama, llama.cpp) means:
- Zero API cost per query — memory search is free at any volume
- No rate limits — agents can search memory as often as needed
- No network dependency — works offline, on air-gapped setups
- Privacy — memory content never leaves the machine
Silent failure mode is dangerous
With fallback: "none", if the embedding service is down, memory_search returns empty results with no error. The agent proceeds as if there's no relevant memory — silently losing access to prior context.
Mitigation options:
- Health check at session start (query a known term, verify non-empty results)
- Set
fallback: "fts"to fall back to text-only search when embeddings are unavailable - Monitor the embedding service uptime independently
Memory maintenance matters
The quality of memory search depends entirely on the quality of what's stored:
- Daily notes are raw logs; MEMORY.md is curated wisdom. Periodic review and distillation prevents noise from drowning signal.
- Stale entries pollute the vector space. An outdated decision or deprecated pattern that's still in memory files will surface as a relevant match, potentially misleading the agent.
- Structured tags (e.g.,
[governance],[defi],[ops]) in daily notes enable scoped recall alongside vector search. - Pruning cadence: review memory files every few days during heartbeats. Remove outdated entries, promote durable lessons to MEMORY.md.
Embedding Model Selection
| Model | Size | Quality | Speed | Notes |
|---|---|---|---|---|
nomic-embed-text | 137M | Good general-purpose | Fast | Recommended starting point |
mxbai-embed-large | 335M | Higher quality | Moderate | Better semantic matching, more RAM |
snowflake-arctic-embed | 110M | Good | Fast | Strong on technical content |
Choose based on available hardware and recall quality requirements. For most setups, nomic-embed-text provides the best balance of quality, speed, and resource usage.
Status
Running in production. Hybrid query with Ollama embeddings is the current setup. Embedding model comparison and recall quality benchmarking need documentation.