Skip to content

Memory Search

OpenClaw agents wake up fresh each session with no memory of prior work. The memory search system bridges this gap — semantic search over workspace files using local embeddings combined with full-text search, enabling agents to recall prior decisions, lessons, and context.

Getting memory search right means the difference between an agent that repeats mistakes and one that learns from them.

Architecture

Hybrid query pipeline

Memory search uses a two-signal hybrid approach:

  1. Vector search — local embeddings (e.g., Ollama with nomic-embed-text) produce semantic similarity scores
  2. Full-text search (FTS) — SQLite FTS provides keyword matching for exact terms the embedding might miss

Results are blended with configurable weighting (e.g., 70% vector + 30% text), then re-ranked using MMR (Maximal Marginal Relevance) to reduce redundancy.

Temporal decay

A configurable half-life (e.g., 30 days) ensures recent context ranks higher than semantically similar but stale entries. Without temporal decay, a lesson from 3 months ago can outrank a relevant decision from yesterday.

Configuration

json
{
  "agents": {
    "defaults": {
      "memorySearch": {
        "embeddings": {
          "provider": "ollama",
          "model": "nomic-embed-text",
          "endpoint": "http://127.0.0.1:11434"
        },
        "hybrid": {
          "vectorWeight": 0.7,
          "textWeight": 0.3,
          "mmrLambda": 0.7,
          "temporalDecayHalfLifeDays": 30
        },
        "fallback": "none"
      }
    }
  }
}

Key Design Points

Local embeddings eliminate API cost and latency

Running embeddings locally (Ollama, llama.cpp) means:

  • Zero API cost per query — memory search is free at any volume
  • No rate limits — agents can search memory as often as needed
  • No network dependency — works offline, on air-gapped setups
  • Privacy — memory content never leaves the machine

Silent failure mode is dangerous

With fallback: "none", if the embedding service is down, memory_search returns empty results with no error. The agent proceeds as if there's no relevant memory — silently losing access to prior context.

Mitigation options:

  • Health check at session start (query a known term, verify non-empty results)
  • Set fallback: "fts" to fall back to text-only search when embeddings are unavailable
  • Monitor the embedding service uptime independently

Memory maintenance matters

The quality of memory search depends entirely on the quality of what's stored:

  • Daily notes are raw logs; MEMORY.md is curated wisdom. Periodic review and distillation prevents noise from drowning signal.
  • Stale entries pollute the vector space. An outdated decision or deprecated pattern that's still in memory files will surface as a relevant match, potentially misleading the agent.
  • Structured tags (e.g., [governance], [defi], [ops]) in daily notes enable scoped recall alongside vector search.
  • Pruning cadence: review memory files every few days during heartbeats. Remove outdated entries, promote durable lessons to MEMORY.md.

Embedding Model Selection

ModelSizeQualitySpeedNotes
nomic-embed-text137MGood general-purposeFastRecommended starting point
mxbai-embed-large335MHigher qualityModerateBetter semantic matching, more RAM
snowflake-arctic-embed110MGoodFastStrong on technical content

Choose based on available hardware and recall quality requirements. For most setups, nomic-embed-text provides the best balance of quality, speed, and resource usage.

Status

Running in production. Hybrid query with Ollama embeddings is the current setup. Embedding model comparison and recall quality benchmarking need documentation.

Built with OpenClaw 🤖