claude-memory
A persistent memory system for Claude Code, implemented as an MCP server. Gives Claude long-term recall across sessions by indexing curated notes, thousands of past conversation archives, and external codebases with hybrid keyword + vector search.
 
The Problem
Claude Code sessions are stateless. Every new conversation starts from scratch. You end up re-explaining context that Claude already helped you figure out last week.
How It Works
claude-memory runs as an MCP server that Claude Code connects to automatically. It provides 13 tools across four categories:
Search & retrieval:
memory_search-- Hybrid FTS5 keyword + vector cosine similarity, merged via Reciprocal Rank Fusion (k=60)memory_deep_search-- 2-pass multi-hop retrieval: standard search, then entity extraction from top results seeds an expanded searchcodebase_search-- Semantic search over indexed source code repositories
Code intelligence:
symbol_search-- Find class/function/method definitions across indexed codebases (SQL LIKE patterns)graph_traverse-- Walk upstream (callers) or downstream (callees) through the call graphcommunity_search-- Identify tightly-coupled file clusters via Louvain community detectiondependency_search-- Query cross-repo dependency edges (repo_depends_on / repo_depended_on_by)entity_browse-- List extracted entities (tools, projects, people) with occurrence countsentity_graph-- Explore entity co-occurrence neighborhoods at depth 1-2
Read & write:
memory_read-- Read specific memory files or retrieve full past conversations by session UUIDmemory_write-- Append to daily logs or long-term memory files, with immediate FTS5 indexing and vector embeddingindex_session-- Index a conversation session JSONL file (called by SessionEnd hook)
Health:
get_status-- Health check for both search backends with chunk/vector counts and model info
All data stays local in ~/.claude-memory/. No external API calls for search. Embeddings are generated locally using three models: bge-base-en-v1.5 (768-dim) for memory search, nomic-embed-text-v1.5 (768-dim) for codebase indexing, and all-MiniLM-L6-v2 (384-dim) for the Node.js batch indexer. Optional TurboQuant 4-bit quantization provides 8x storage compression with >=0.998 recall@10.
Quick Start
1. Clone and build
git clone https://github.com/NathanNorman/claude-memory.git
cd claude-memory
npm install
npm run build
2. Set up the Python environment
The MCP server runs in Python. Set up a venv with the required packages:
python3 -m venv ~/.claude-memory/graphiti-venv
~/.claude-memory/graphiti-venv/bin/pip install mcp sentence-transformers torch numpy
3. Add to Claude Code
Add to your MCP settings (e.g., ~/.claude.json):
{
"mcpServers": {
"unified-memory": {
"type": "stdio",
"command": "/bin/bash",
"args": ["/path/to/claude-memory/unified-mcp-launcher.sh"]
}
}
}
4. Initialize the index
# Build the search index from conversation archives
node dist/reindex-cli.js
5. Start using it
Claude Code will automatically have access to all 13 tools. No additional configuration needed.
Architecture
The system has three subsystems: a Python MCP server (runtime), a Node.js indexer (batch), and a webhook pipeline (real-time remote indexing). All three share a single SQLite database in WAL mode.
~/claude-memory/ # This repo (source code)
├── src/
│ ├── unified_memory_server.py # Python MCP server (runtime, 13 tools)
│ ├── server.ts # Node.js MCP server entry point
│ ├── tools.ts # MCP tool handlers + Zod schemas
│ ├── types.ts # Shared TypeScript types
│ │
│ │ # Search
│ ├── search.ts # Search orchestration (keyword + vector)
│ ├── hybrid.ts # FTS5 query building, BM25 scoring, RRF merge
│ ├── db.ts # SQLite operations, migrations
│ ├── embeddings.ts # Embedding generation (ONNX, transformers.js)
│ ├── quantize.py # TurboQuant 4-bit quantization (WHT + Lloyd-Max)
│ │
│ │ # Chunking
│ ├── chunker.ts # Exchange-aware conversation chunking
│ ├── semantic-chunker.ts # Boundary scoring + variance-minimizing DP
│ ├── semantic-markdown-chunker.ts # 3-stage markdown chunking pipeline
│ ├── llm-boundary-scorer.ts # LLM-based scoring (coprime windows 16, 11)
│ ├── llm-client.ts # OpenAI-compatible LLM client
│ ├── code_chunker.py # Code-aware chunking (AST/regex/size-based)
│ ├── conversation-parser.ts # JSONL -> structured exchange pairs
│ │
│ │ # Code intelligence
│ ├── ast_parser.py # tree-sitter (Java/Kotlin/TS) + ast (Python)
│ ├── import_resolver.py # Import string -> file path resolution
│ ├── call_resolver.py # 6-strategy call resolution cascade
│ ├── scip_parser.py # Optional SCIP indexer integration (Tier 2)
│ ├── build_parser.py # Gradle/Maven/pip/npm dependency extraction
│ │
│ │ # Webhook pipeline
│ ├── webhook_server.py # FastAPI webhook receiver (HMAC-SHA256)
│ ├── job_queue.py # SQLite-backed job queue with deduplication
│ ├── index_worker.py # Background worker (bare mirror indexing)
│ ├── mirror_manager.py # Bare git clone/fetch management
│ ├── poll_repos.py # Polling fallback (git ls-remote cron)
│ │
│ │ # Tools
│ ├── doctor-cli.ts # Database diagnostics and repair
│ ├── reindex-cli.ts # Batch reindexing CLI
│ ├── indexer.ts # File scanning, staleness detection
│ ├── integration.test.ts # Integration tests
│ └── prompts/ # LLM scoring prompts
│ ├── boundary-score-system.txt
│ └── boundary-score-user.txt
│
├── scripts/
│ ├── codebase-index.py # External codebase indexer
│ ├── index_session.py # Real-time session indexer (SessionEnd hook)
│ ├── conversation_parser.py # JSONL conversation parser (Python)
│ ├── cross_repo_deps.py # Cross-repo dependency graph builder
│ ├── build-reference-db.py # Addon reference database builder
│ ├── migrate_to_quantized.py # TurboQuant sidecar file generation
│ ├── backfill_entity_relationships.py # Entity graph backfill
│ ├── backfill_signals.py # Signal backfill utility
│ ├── bulk_index.py # Bulk indexing utility
│ ├── ingest_archive.py # Archive ingestion
│ ├── summary_refinement.py # LLM judge-refine summary loop
│ ├── summary_prompts.py # Summary/judge/refiner prompts
│ ├── summary_llm.py # LLM client for summaries (claude CLI)
│ ├── start-webhook-server.sh # Webhook server launcher
│ ├── index_missing_sessions.sh # Catch-up indexing for missed sessions
│ ├── restore_pre_turboquant.sh # Rollback script for quantization
│ └── test_*.py # Test files (14 test modules)
│
├── benchmarks/
│ ├── retrieval_bench.py # Recall@5/10 benchmark harness
│ ├── corpus.json # 50-document synthetic corpus
│ ├── baseline.json # 2-signal baseline (R@5=0.680)
│ └── baseline-4signal.json # 4-signal baseline (R@5=0.777)
│
└── unified-mcp-launcher.sh # MCP server launcher
~/.claude-memory/ # Runtime data directory
├── MEMORY.md # Long-term curated knowledge
├── memory/
│ └── YYYY-MM-DD.md # Daily structured logs
├── index/
│ ├── memory.db # SQLite search index (FTS5 + embeddings)
│ ├── reindex.lock # File lock for serialized writes
│ ├── packed_vectors.bin # TurboQuant 4-bit sidecar (optional)
│ ├── rerank_matrix.f32 # Float32 rerank sidecar (optional)
│ └── quantization.json # Quantization metadata (optional)
├── mirrors/ # Bare git clones (webhook pipeline)
├── conversation-archive/ # JSONL backups (rsync'd every 30min)
├── backups/ # Daily DB backups
└── graphiti-venv/ # Python virtualenv
Search Pipeline
- FTS5 keyword search -- Fast exact matching via SQLite FTS5 (BM25 ranking)
- Vector similarity search -- Three-stage quantized search: binary Hamming coarse pass (top 1,000), 4-bit TurboQuant dot products (top 50), float32 mmap exact rerank (top k)
- Reciprocal Rank Fusion -- Results from both backends merged with RRF (k=60)
- Post-filtering -- Date range, project, source type filters applied
- Deduplication -- Session results capped at 2 per conversation file
- Truncation -- Snippets cut at sentence boundaries
Chunking Strategies
Curated memory files use a 3-stage semantic markdown chunking pipeline:
- Parse -- Split markdown into 7 atomic unit types (headings, paragraphs, code blocks, lists, tables, thematic breaks, frontmatter)
- Score boundaries -- Heuristic scoring based on heading level changes, topic transitions, content type shifts, blank lines
- Segment -- Variance-minimizing dynamic programming to find optimal chunk boundaries (minChunkTokens=100, maxChunkTokens=2000, varianceWeight=0.3)
Conversation archives use exchange-aware chunking:
- JSONL files are parsed into user/assistant exchange pairs
- Boundary scoring uses 7 signals: topic shift phrases (+1.5), file path shifts (+1.0), time gaps (+0.5/+1.0), tool type shifts (+0.5), read-write transitions (+0.5), user questions (+0.25)
- Optional LLM-based scoring via coprime windows (sizes 16 and 11, gcd=1) with per-pair caching
- Same variance-minimizing DP segments exchanges into coherent topic-based chunks
Source code (via codebase indexer) uses language-aware chunking:
- Python: AST-based (functions, classes via stdlib
ast) - TypeScript/JavaScript: tree-sitter (class, function, interface, enum, arrow function declarations)
- Java/Kotlin: Regex-based (class/interface/method declarations)
- Shell: Function declaration splitting
- Other files: Size-based splitting at blank-line boundaries
Embedding on Write
When memory_write is called, the server:
- Writes content to the target markdown file
- Chunks and indexes via FTS5 (immediate keyword search coverage)
- Generates embeddings via
bge-base-en-v1.5(768-dim), quantizes to 4-bit, and writes to thechunkstable (immediate vector search coverage)
No waiting for the Node.js reindexer -- written memories are searchable via both backends immediately.
Code Intelligence
The code intelligence subsystem builds a call graph and type hierarchy from indexed codebases:
AST extraction (ast_parser.py): tree-sitter for Java, Kotlin, and TypeScript; stdlib ast for Python. Extracts imports (with type classification), symbol declarations (classes, interfaces, functions, methods with line numbers), call sites, and type hierarchy (extends, implements, delegation).
Call resolution (call_resolver.py): A 6-strategy cascade resolves each extracted call site to a target symbol, short-circuiting on first match:
| Priority | Strategy | Confidence | |----------|----------|------------| | 1 | Import-map exact match | 0.95 | | 2 | Import-map suffix fallback | 0.85 | | 3 | Same-module prefix match | 0.90 | | 4 | Unique name project-wide | 0.75 | | 5 | Suffix + directory distance | 0.55 | | 6 | Fuzzy string similarity | 0.30-0.40 |
SCIP integration (scip_parser.py): Optional Tier 2 indexing via scip-java, scip-typescript, or scip-python. SCIP edges (0.95 confidence) replace tree-sitter edges for the same source/target file pair.
Cross-repo dependencies (cross_repo_deps.py + build_parser.py): Parses Gradle KTS/Groovy (including version catalog TOML), Maven (with property interpolation), pyproject.toml, requirements.txt, and package.json into repo_dependency edges.
Webhook Pipeline
For repositories on GitHub rather than the local machine, the webhook pipeline provides push-triggered incremental indexing:
- GitHub push fires a webhook to
webhook_server.py(FastAPI, HMAC-SHA256 verified) - Job enqueued to a SQLite-backed queue with deduplication (rapid pushes to the same repo coalesce into one job)
- Background worker claims job, fetches bare git mirror, computes diff
- Only changed files are re-chunked and re-embedded
- Performance target: under 1 second per job
Polling fallback via poll_repos.py checks tracked repos via git ls-remote and enqueues jobs when remote HEAD changes.
Iterative Summary Refinement
Conversation sessions can be automatically summarized using an LLM judge-refine loop:
- Summarize -- Generate initial summary from conversation transcript
- Judge -- Score summary on 6 dimensions (decisions/rationale, identifiers/configs, approaches tried, file references, correctness, structure) on a 0-10 scale
- Refine -- If score < threshold (default 8.0), refine with judge feedback and re-score
- Store -- Final summary saved to
files.summarycolumn for search result enrichment
Controlled via MEMORY_SUMMARY_ENABLED=1 and MEMORY_SUMMARY_MODEL env vars.
Codebase Indexing
External repositories can be indexed for semantic search:
# Full index
python3 scripts/codebase-index.py --path ~/my-repo --name my-repo
# Incremental update (only changed files)
python3 scripts/codebase-index.py --path ~/my-repo --name my-repo --update
# Low-impact mode (throttled, nice'd)
python3 scripts/codebase-index.py --path ~/my-repo --name my-repo --throttle
# List indexed codebases
python3 scripts/codebase-index.py --list
# Remove
python3 scripts/codebase-index.py --remove --name my-repo
Codebase chunks are stored in the main chunks table with file_path prefixed by codebase:<name>/. A PreToolUse:Write hook surfaces similar existing code when creating new source files, preventing duplicate implementations.
Addon Reference Databases
Skills and plugins can ship pre-built .db files containing searchable reference material. The server discovers these at startup and makes them searchable via memory_search(source="<name>").
# Build from a directory of markdown/text files
python3 scripts/build-reference-db.py ./my-docs/ -o my-skill.db
Concurrent Access
Multiple Claude Code sessions each spawn their own MCP server process, all sharing the same SQLite database:
- Write serialization -- File lock (
reindex.lock) ensures only one process reindexes at a time - Graceful search degradation -- Vector and keyword search are wrapped independently; if one fails, the other still returns results
- Busy timeout --
busy_timeout = 5000gives concurrent readers/writers 5 seconds to acquire locks - Graceful shutdown -- SIGTERM/SIGINT handlers checkpoint the WAL and close cleanly
Indexing
- Curated memory files are chunked using the semantic markdown chunker (parse -> score -> DP segmentation)
- Conversation archives are parsed into exchange-aware chunks with boundary scoring
- Only main session files (
<uuid>.jsonl) are indexed; agent subagent files are skipped - Conversation chunks are never pruned -- even after Claude Code deletes the original JSONL, the indexed content survives
- Embeddings are generated locally (ONNX runtime for Node.js, sentence-transformers for Python)
- TurboQuant 4-bit quantization compresses embeddings 8x with >=0.998 recall@10
- Index staleness is checked via file modification times -- reindexing only processes changed files
- Embedding cache table avoids re-embedding unchanged content on reindex
Automatic indexing is handled three ways:
- A SessionEnd hook (
index_sessionMCP tool) indexes each session immediately with FTS5; embeddings are filled lazily on next server warmup - A cron job (
memory-reindex) runs every 30 minutes as a catch-all for missed sessions - A conversation backup cron (
conversation-backup) rsyncs raw JSONL files every 30 minutes to~/.claude-memory/conversation-archive/before Claude Code can prune them
Manual reindex: npx tsc && node dist/reindex-cli.js
Tools Reference
memory_search
Search memories using hybrid keyword + vector search.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | query | string | (required) | Search query text | | maxResults | number | 10 | Maximum results to return | | minScore | number | 0 | Minimum relevance score (0-1) | | after | string | "" | Only results after this date (YYYY-MM-DD) | | before | string | "" | Only results before this date (YYYY-MM-DD) | | project | string | "" | Filter by project directory name | | source | string | "" | "curated", "conversations", "codebase", or "" for all |
memory_deep_search
2-pass multi-hop search with entity expansion. Same parameters as memory_search. Pass 1 runs standard hybrid search. Pass 2 extracts entities (tools, projects, people) from top results and searches for those entities via keyword + entity overlap (skips vector + temporal to save ~500ms).
codebase_search
Search indexed codebases for existing implementations.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | query | string | (required) | Search query (e.g., "manifest discovery") | | codebase | string | "" | Filter to a specific codebase name, or "" for all | | maxResults | number | 10 | Maximum results to return |
symbol_search
Find symbol definitions (classes, functions, methods) across indexed codebases.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | pattern | string | (required) | SQL LIKE pattern (e.g., "%PaymentService%") | | codebase | string | "" | Filter to a specific codebase | | kind | string | "" | Filter by symbol kind: "class", "function", "method", etc. |
graph_traverse
Walk the call graph upstream (callers) or downstream (callees) from a file.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | file_path | string | (required) | File path within the codebase | | direction | string | "downstream" | "upstream" (callers) or "downstream" (callees) | | depth | number | 1 | Traversal depth (1-3) |
community_search
Find the cluster of tightly-coupled files around a given file using Louvain community detection.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | file_path | string | (required) | File path to find the community for |
dependency_search
Query cross-repo build dependency edges.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | codebase | string | (required) | Codebase name | | direction | string | "imports" | "imports" (what this repo depends on) or "imported_by" (what depends on this repo) |
entity_browse
List entities extracted from indexed content, ranked by occurrence count.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | entity_type | string | "" | Filter by type: "tool", "project", "person", or "" for all | | limit | number | 50 | Maximum entities to return |
entity_graph
Explore entity co-occurrence neighborhoods.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | entity | string | (required) | Entity value to explore | | depth | number | 1 | Neighborhood depth (1-2) |
memory_read
Read a specific memory file or retrieve a past conversation.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | path | string | (required) | Relative path within ~/.claude-memory/, or a session UUID | | from_line | number | 1 | Starting line number (1-based) | | lines | number | 0 | Number of lines to return (0 = all) |
memory_write
Write to memory files with immediate indexing and embedding.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | content | string | (required) | Content to write | | file | string | "memory/YYYY-MM-DD.md" | Target file (MEMORY.md or memory/*.md) | | append | boolean | true | Append to file or overwrite |
index_session
Index a conversation session JSONL file (called by SessionEnd hook).
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | session_path | string | (required) | Absolute path to the session JSONL file |
get_status
Health check for both backends. Returns chunk counts, vector counts, model info, quantization status.
Retrieval Benchmarks
A synthetic corpus of 50 documents and 50 queries across four categories measures recall:
| Configuration | R@5 | R@10 | |---------------|-----|------| | 2-signal (keyword + vector) | 0.680 | 0.786 | | 4-signal (+ temporal + entity) | 0.777 | 0.858 |
| Category | 2-signal R@5 | 4-signal R@5 | Delta | |----------|--------------|--------------|-------| | entity | 0.896 | 1.000 | +10.4pp | | general | 0.833 | 0.833 | +0.0pp | | multi-hop | 0.463 | 0.642 | +17.9pp | | temporal | 0.563 | 0.655 | +9.2pp |
Run benchmarks: python3 benchmarks/retrieval_bench.py
Database Doctor
A built-in diagnostic and repair tool for the search index.
# Diagnose (read-only)
node dist/doctor-cli.js
# Diagnose and repair
node dist/doctor-cli.js --fix
Checks: chunk/file/vector row counts, FTS5 integrity, cross-table consistency, WAL size, stale processes, stale locks.
Repairs (with --fix): Rebuilds FTS5 and vec0 tables from source data, checkpoints WAL, removes stale locks.
Development
npm install # Install Node.js dependencies
npm run build # Build indexer + doctor CLI (esbuild bundles)
npm run typecheck # TypeScript type checking
npm test # tsc compile + integration tests (node --test)
# Python tests
python3 -m pytest scripts/test_*.py -v
Tech Stack
MCP Server (Python):
- FastMCP -- MCP server framework
- sentence-transformers -- Local embedding generation (bge-base-en-v1.5 768-dim, nomic-embed-text-v1.5 768-dim)
- SQLite (stdlib) -- FTS5 keyword search + embedding BLOB storage
- TurboQuant -- 4-bit vector quantization with Walsh-Hadamard rotation + Lloyd-Max codebook
Indexer (Node.js):
- better-sqlite3 -- SQLite with WAL mode
- sqlite-vec -- ANN vector index (vec0)
- Xenova/transformers.js -- ONNX embedding generation (all-MiniLM-L6-v2 384-dim)
- esbuild -- Single-file bundle
Webhook Pipeline (Python):
- FastAPI -- Webhook receiver with HMAC-SHA256 verification
- Bare git mirrors -- No working copies, reads via
git show - SQLite job queue -- Deduplication, atomic claims via
BEGIN IMMEDIATE
Code Intelligence (Python):
- tree-sitter -- AST extraction for Java, Kotlin, TypeScript
- SCIP -- Optional compiler-grade indexing (Tier 2)
- 6-strategy call resolution cascade (0.95 to 0.30 confidence)
Chunking & Scoring:
- Semantic markdown chunker -- Parse -> boundary score -> variance-minimizing DP segmentation
- Exchange-aware conversation chunker -- 7 boundary signals, coprime LLM scoring windows
- Code chunker -- AST (Python), tree-sitter (TS/JS), regex (Java/Kotlin/Shell), size-based (other)
License
MIT






