code-index
<!-- mcp-name: io.github.achreftlili/code-index -->
A local, SQLite-backed code index for Claude Code, exposed over MCP. It replaces blind Read / Grep / Glob exploration with targeted retrieval — "where is parseAuthToken defined", "what calls Indexer.reindex_all", "find the rate-limiting code" — answered in milliseconds against an offline index.
No API keys. No external services. The embedder runs locally on your machine.
How it works (30-second tour)
- Parse your repo with tree-sitter (Python, TypeScript/JavaScript, Go, Rust).
- Chunk code per symbol and expand identifiers (
getUserAuthToken→get user auth token) so search matches both styles. - Embed each chunk locally with
jina-embeddings-v2-base-code(768-dim) via sentence-transformers. - Store symbols, chunks, vectors, and call/import edges in
.claude/index.db(SQLite + sqlite-vec + FTS5). - Serve 14 retrieval tools + 1 admin tool over MCP (see Tools).
- Stay fresh via an optional
PostToolUsehook that incrementally re-indexes touched files.
Tools
Retrieval
| Tool | Purpose | | ------------------- | ------------------------------------------------------------------------------------------------------ | | code_search | Hybrid (vector + FTS) search for conceptual queries (e.g., "auth flow", "where do we parse JSON"). | | symbol_lookup | Exact-name lookup of functions / classes / methods / types. Prefer over code_search for identifiers. | | file_outline | Symbols (with signatures) in a file, in source order. Use instead of Read when you only need shape. | | module_outline | Symbols across a directory subtree in one call. Use instead of looping file_outline. | | where_am_i | Given path + line, returns the innermost symbol and the full enclosing chain. | | get_symbol_body | Full chunk for a symbol_id from symbol_lookup / code_search / file_outline. | | get_symbol_bodies | Batch version of get_symbol_body (up to 20 ids per call). | | callers | Symbols that CALL the given symbol. depth (1-5) expands transitively. | | callees | Symbols that the given symbol CALLS. depth (1-5) expands transitively. | | references | Non-call uses (subclasses, free identifier references). Companion to callers / callees. | | trace | Build a call-graph tree from an entry symbol; flat=true returns nodes/edges for cheap LLM scans. | | file_imports | Files this file imports (direction=imports) or that import it (direction=imported_by). | | recent_changes | Files touched in the last N git commits. | | propose_rename | v1: same-file rename. Returns an edit list the agent applies via its own Edit tool; refuses on clash. |
Admin
| Tool / op | Purpose | | -------------------------- | ------------------------------------------------------------------------------------------------ | | admin op=init | Build or refresh the index. Incremental by default; force=true rebuilds from scratch. | | admin op=setup_check | Diagnose hook wiring + embedder + host. Round-trip-tests the hook end-to-end. | | admin op=install_hook | Wire the auto-reindex PostToolUse hook into .claude/settings.json. Idempotent. | | admin op=stats | Read-only: file counts by language, symbol totals, embed model fingerprint, last-index time. | | admin op=verify | Integrity sweep: orphan rows, parse-failure files, dangling edges. |
embed_query_debug is a dev-only ranking diagnostic, hidden from list_tools unless CODE_INDEX_DEBUG=1 is set.
All tools return bounded JSON; large bodies use get_symbol_body rather than inlining whole files.
Requirements
- Python 3.10+ with loadable SQLite extension support (required by
sqlite-vec). - Python 3.13 has this enabled by default.
- On 3.10–3.12, install via the python.org installer or via pyenv with
PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions pyenv install 3.12.x.
- Homebrew Python often ships without the extension hook — use one of the
two methods above instead.
uv/uvx(install) — recommended runner. Orpipif you prefer a permanent install.- ~600 MB free disk for the embedding model on first init.
Quick start (Claude Code)
One command, no API keys:
claude mcp add-json -s user code-index "$(cat <<'JSON'
{
"type": "stdio",
"command": "uvx",
"args": ["--refresh", "--from", "mcp-code-index", "code-index-mcp"]
}
JSON
)"
Then open Claude Code in any repo and ask:
_"Build the code index for this repo."_
Claude calls the init MCP tool, which writes .claude/index.db. From then on, ask things like _"where is parseAuthToken defined?"_ or _"what calls Indexer.reindex_all?"_ — Claude routes them through symbol_lookup / callers / code_search instead of grepping.
What
--refreshdoes — fetches the latest PyPI release on every Claude Code launch. Convenient during preview; drop it once you want to pin a version (saves ~1s of startup). Project-only install — drop-s userto register the server in the current project's.claude/settings.jsoninstead of the global~/.claude.json. First-run model download — the firstinitpullsjina-embeddings-v2-base-code(~600 MB) into~/.cache/huggingfaceand caches it forever. Subsequent runs are fully offline. If your network blocks Hugging Face, pre-warm the cache from a machine that has access. Already installed without--refresh? Runclaude mcp remove code-indexfirst, then re-run the command above.
Alternative: permanent install (no uvx)
pip install mcp-code-index
claude mcp add -s user code-index -- code-index-mcp
Optional: keep the index live as you edit
Without a hook, the index drifts when files change outside the agent (mv, git checkout, IDE saves) until you call init again. With one, every Edit / Write / MultiEdit Claude performs triggers an incremental reindex of the touched file.
Easiest path: ask Claude. On first use in a new project, ask _"set up the code-index"_ — Claude calls setup_check → install_hook → init. The hook command is derived from how the MCP server was launched (uvx-aware), so it uses the same Python toolchain. Hook output goes to .claude/code-index-hook.log so failures are debuggable.
Manual install — add this block to the project's .claude/settings.json under hooks.PostToolUse (the version you want depends on how you launch the server — install_hook derives the right one for you):
{
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{
"type": "command",
"command": "uvx --with 'sentence-transformers<5' --with 'numpy<2' --from mcp-code-index code-index-hook"
}
]
}
In other MCP-compatible agents
The server speaks standard MCP over stdio, so any client that supports MCP servers works (Cursor, Continue, Cody, Zed, etc.). Configure the client to launch uvx --refresh --from mcp-code-index code-index-mcp (or code-index-mcp after pip install mcp-code-index). Once connected, call the init tool from inside the client to bootstrap the index. Drop --refresh when you want to pin to a stable version instead of always pulling latest.
From source (development)
git clone https://github.com/achreftlili/code-index
cd code-index
pip install -e .
code-index init # CLI alternative to the `init` MCP tool
code-index-mcp # starts the MCP server on stdio (for manual wiring)
Configuration
All settings are optional — the defaults work out of the box. Override them via environment variables. Inside Claude Code, set them in the env block of your code-index server entry in ~/.claude.json (then reconnect the MCP server).
Common knobs (most users only ever touch these):
| Var | Default | When to set it | |---|---|---| | CODE_INDEX_EMBED_DEVICE | _auto_ | Force the torch device: cpu, mps, or cuda. Set cpu on Apple Silicon if init fails with MPS out-of-memory. | | CODE_INDEX_EMBED_BATCH | 32 | Encode batch size. Lower (e.g. 8 or 4) to cut peak GPU memory while staying on mps/cuda. | | CODE_INDEX_DB | .claude/index.db | Override the SQLite index path (e.g. to share an index across sibling worktrees). |
Advanced (rarely needed):
| Var | Default | Notes | |---|---|---| | CODE_INDEX_EMBEDDER | jina | Only jina (local sentence-transformers) is supported today; the variable exists for future expansion. | | CODE_INDEX_EMBED_MODEL | jinaai/jina-embeddings-v2-base-code | HuggingFace model id. Only override if you know the model is dim-compatible (768d). | | CODE_INDEX_EMBED_DIM | 768 | Must match the embedding model's output dimension. |
Troubleshooting
init fails with MPS backend out of memory on Apple Silicon. A large file produced a chunk batch bigger than your GPU's free VRAM. Quickest fix — re-run on CPU (slower but bulletproof):
"env": {
"CODE_INDEX_EMBED_DEVICE": "cpu"
}
To stay on the GPU, shrink the batch instead: "CODE_INDEX_EMBED_BATCH": "8". Reconnect the MCP server (/mcp → reconnect, or restart Claude Code) so the new env takes effect. init is incremental — already-embedded files are skipped on the retry.
init fails with a Hugging Face network error on first run. Your network is blocking model downloads. Pre-warm the cache on a machine that has access:
huggingface-cli download jinaai/jina-embeddings-v2-base-code
# then copy ~/.cache/huggingface/ to the offline machine
sqlite3.OperationalError: not authorized or sqlite-vec fails to load. Your Python build doesn't have loadable SQLite extensions. See Requirements — install via python.org or a pyenv build with PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions.
code_search / symbol_lookup returns stale paths after a refactor or branch checkout. The auto-reindex hook only fires on Claude's Edit / Write / MultiEdit. After bulk file moves outside the agent (mv, git checkout, IDE rename), re-run init (it's incremental). Or wire up the hook so the index keeps up with agent edits automatically.
Layout
src/code_index/
db.py SQLite schema, connection, sqlite-vec loading
parser.py Tree-sitter wrapper, symbol + edge extraction
imports.py Per-language import target → file path resolution
chunker.py Per-symbol chunks, identifier expansion
embedder.py Local Jina (sentence-transformers) backend
indexer.py Pipeline: walk → parse → chunk → embed → write
reindexer.py Per-root engine cache; one entry point for "reindex one file"
retriever.py Hybrid search (vector + FTS5) with RRF
watcher.py File watcher (watchdog)
admin.py setup_check / install_hook / init logic (pure, no MCP state)
mcp_server.py MCP wiring, shared helpers, schema fragments
tool_registry.py Shared `@_tool` decorator + `_TOOLS` registry
tools/ Per-domain MCP handlers (graph, paths, refactor, …)
hook.py `code-index-hook` console script — the PostToolUse entry point
cli.py init / reindex / watch / stats





