Code Index

achreftlili/code-index
0 starsMITCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

SQLite-backed hybrid (vector + FTS) code search index for Claude Code over MCP.

README.md

code-index

<!-- mcp-name: io.github.achreftlili/code-index -->

A local, SQLite-backed code index for Claude Code, exposed over MCP. It replaces blind Read / Grep / Glob exploration with targeted retrieval — "where is parseAuthToken defined", "what calls Indexer.reindex_all", "find the rate-limiting code" — answered in milliseconds against an offline index.

No API keys. No external services. The embedder runs locally on your machine.

How it works (30-second tour)

  1. Parse your repo with tree-sitter (Python, TypeScript/JavaScript, Go, Rust).
  2. Chunk code per symbol and expand identifiers (getUserAuthTokenget user auth token) so search matches both styles.
  3. Embed each chunk locally with jina-embeddings-v2-base-code (768-dim) via sentence-transformers.
  4. Store symbols, chunks, vectors, and call/import edges in .claude/index.db (SQLite + sqlite-vec + FTS5).
  5. Serve 14 retrieval tools + 1 admin tool over MCP (see Tools).
  6. Stay fresh via an optional PostToolUse hook that incrementally re-indexes touched files.

Tools

Retrieval

| Tool | Purpose | | ------------------- | ------------------------------------------------------------------------------------------------------ | | code_search | Hybrid (vector + FTS) search for conceptual queries (e.g., "auth flow", "where do we parse JSON"). | | symbol_lookup | Exact-name lookup of functions / classes / methods / types. Prefer over code_search for identifiers. | | file_outline | Symbols (with signatures) in a file, in source order. Use instead of Read when you only need shape. | | module_outline | Symbols across a directory subtree in one call. Use instead of looping file_outline. | | where_am_i | Given path + line, returns the innermost symbol and the full enclosing chain. | | get_symbol_body | Full chunk for a symbol_id from symbol_lookup / code_search / file_outline. | | get_symbol_bodies | Batch version of get_symbol_body (up to 20 ids per call). | | callers | Symbols that CALL the given symbol. depth (1-5) expands transitively. | | callees | Symbols that the given symbol CALLS. depth (1-5) expands transitively. | | references | Non-call uses (subclasses, free identifier references). Companion to callers / callees. | | trace | Build a call-graph tree from an entry symbol; flat=true returns nodes/edges for cheap LLM scans. | | file_imports | Files this file imports (direction=imports) or that import it (direction=imported_by). | | recent_changes | Files touched in the last N git commits. | | propose_rename | v1: same-file rename. Returns an edit list the agent applies via its own Edit tool; refuses on clash. |

Admin

| Tool / op | Purpose | | -------------------------- | ------------------------------------------------------------------------------------------------ | | admin op=init | Build or refresh the index. Incremental by default; force=true rebuilds from scratch. | | admin op=setup_check | Diagnose hook wiring + embedder + host. Round-trip-tests the hook end-to-end. | | admin op=install_hook | Wire the auto-reindex PostToolUse hook into .claude/settings.json. Idempotent. | | admin op=stats | Read-only: file counts by language, symbol totals, embed model fingerprint, last-index time. | | admin op=verify | Integrity sweep: orphan rows, parse-failure files, dangling edges. |

embed_query_debug is a dev-only ranking diagnostic, hidden from list_tools unless CODE_INDEX_DEBUG=1 is set.

All tools return bounded JSON; large bodies use get_symbol_body rather than inlining whole files.

Requirements

  • Python 3.10+ with loadable SQLite extension support (required by sqlite-vec).
  • Python 3.13 has this enabled by default.
  • On 3.10–3.12, install via the python.org installer or via pyenv with

PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions pyenv install 3.12.x.

  • Homebrew Python often ships without the extension hook — use one of the

two methods above instead.

  • uv / uvx (install) — recommended runner. Or pip if you prefer a permanent install.
  • ~600 MB free disk for the embedding model on first init.

Quick start (Claude Code)

One command, no API keys:

claude mcp add-json -s user code-index "$(cat <<'JSON'
{
  "type": "stdio",
  "command": "uvx",
  "args": ["--refresh", "--from", "mcp-code-index", "code-index-mcp"]
}
JSON
)"

Then open Claude Code in any repo and ask:

_"Build the code index for this repo."_

Claude calls the init MCP tool, which writes .claude/index.db. From then on, ask things like _"where is parseAuthToken defined?"_ or _"what calls Indexer.reindex_all?"_ — Claude routes them through symbol_lookup / callers / code_search instead of grepping.

What --refresh does — fetches the latest PyPI release on every Claude Code launch. Convenient during preview; drop it once you want to pin a version (saves ~1s of startup). Project-only install — drop -s user to register the server in the current project's .claude/settings.json instead of the global ~/.claude.json. First-run model download — the first init pulls jina-embeddings-v2-base-code (~600 MB) into ~/.cache/huggingface and caches it forever. Subsequent runs are fully offline. If your network blocks Hugging Face, pre-warm the cache from a machine that has access. Already installed without --refresh? Run claude mcp remove code-index first, then re-run the command above.

Alternative: permanent install (no uvx)

pip install mcp-code-index
claude mcp add -s user code-index -- code-index-mcp

Optional: keep the index live as you edit

Without a hook, the index drifts when files change outside the agent (mv, git checkout, IDE saves) until you call init again. With one, every Edit / Write / MultiEdit Claude performs triggers an incremental reindex of the touched file.

Easiest path: ask Claude. On first use in a new project, ask _"set up the code-index"_ — Claude calls setup_checkinstall_hookinit. The hook command is derived from how the MCP server was launched (uvx-aware), so it uses the same Python toolchain. Hook output goes to .claude/code-index-hook.log so failures are debuggable.

Manual install — add this block to the project's .claude/settings.json under hooks.PostToolUse (the version you want depends on how you launch the server — install_hook derives the right one for you):

{
  "matcher": "Edit|Write|MultiEdit",
  "hooks": [
    {
      "type": "command",
      "command": "uvx --with 'sentence-transformers<5' --with 'numpy<2' --from mcp-code-index code-index-hook"
    }
  ]
}

In other MCP-compatible agents

The server speaks standard MCP over stdio, so any client that supports MCP servers works (Cursor, Continue, Cody, Zed, etc.). Configure the client to launch uvx --refresh --from mcp-code-index code-index-mcp (or code-index-mcp after pip install mcp-code-index). Once connected, call the init tool from inside the client to bootstrap the index. Drop --refresh when you want to pin to a stable version instead of always pulling latest.

From source (development)

git clone https://github.com/achreftlili/code-index
cd code-index
pip install -e .
code-index init        # CLI alternative to the `init` MCP tool
code-index-mcp         # starts the MCP server on stdio (for manual wiring)

Configuration

All settings are optional — the defaults work out of the box. Override them via environment variables. Inside Claude Code, set them in the env block of your code-index server entry in ~/.claude.json (then reconnect the MCP server).

Common knobs (most users only ever touch these):

| Var | Default | When to set it | |---|---|---| | CODE_INDEX_EMBED_DEVICE | _auto_ | Force the torch device: cpu, mps, or cuda. Set cpu on Apple Silicon if init fails with MPS out-of-memory. | | CODE_INDEX_EMBED_BATCH | 32 | Encode batch size. Lower (e.g. 8 or 4) to cut peak GPU memory while staying on mps/cuda. | | CODE_INDEX_DB | .claude/index.db | Override the SQLite index path (e.g. to share an index across sibling worktrees). |

Advanced (rarely needed):

| Var | Default | Notes | |---|---|---| | CODE_INDEX_EMBEDDER | jina | Only jina (local sentence-transformers) is supported today; the variable exists for future expansion. | | CODE_INDEX_EMBED_MODEL | jinaai/jina-embeddings-v2-base-code | HuggingFace model id. Only override if you know the model is dim-compatible (768d). | | CODE_INDEX_EMBED_DIM | 768 | Must match the embedding model's output dimension. |

Troubleshooting

init fails with MPS backend out of memory on Apple Silicon. A large file produced a chunk batch bigger than your GPU's free VRAM. Quickest fix — re-run on CPU (slower but bulletproof):

"env": {
  "CODE_INDEX_EMBED_DEVICE": "cpu"
}

To stay on the GPU, shrink the batch instead: "CODE_INDEX_EMBED_BATCH": "8". Reconnect the MCP server (/mcp → reconnect, or restart Claude Code) so the new env takes effect. init is incremental — already-embedded files are skipped on the retry.

init fails with a Hugging Face network error on first run. Your network is blocking model downloads. Pre-warm the cache on a machine that has access:

huggingface-cli download jinaai/jina-embeddings-v2-base-code
# then copy ~/.cache/huggingface/ to the offline machine

sqlite3.OperationalError: not authorized or sqlite-vec fails to load. Your Python build doesn't have loadable SQLite extensions. See Requirements — install via python.org or a pyenv build with PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions.

code_search / symbol_lookup returns stale paths after a refactor or branch checkout. The auto-reindex hook only fires on Claude's Edit / Write / MultiEdit. After bulk file moves outside the agent (mv, git checkout, IDE rename), re-run init (it's incremental). Or wire up the hook so the index keeps up with agent edits automatically.

Layout

src/code_index/
  db.py           SQLite schema, connection, sqlite-vec loading
  parser.py       Tree-sitter wrapper, symbol + edge extraction
  imports.py      Per-language import target → file path resolution
  chunker.py      Per-symbol chunks, identifier expansion
  embedder.py     Local Jina (sentence-transformers) backend
  indexer.py      Pipeline: walk → parse → chunk → embed → write
  reindexer.py    Per-root engine cache; one entry point for "reindex one file"
  retriever.py    Hybrid search (vector + FTS5) with RRF
  watcher.py      File watcher (watchdog)
  admin.py        setup_check / install_hook / init logic (pure, no MCP state)
  mcp_server.py   MCP wiring, shared helpers, schema fragments
  tool_registry.py  Shared `@_tool` decorator + `_TOOLS` registry
  tools/          Per-domain MCP handlers (graph, paths, refactor, …)
  hook.py         `code-index-hook` console script — the PostToolUse entry point
  cli.py          init / reindex / watch / stats

Related MCP servers

Browse all →