CodeGraph

Cross-language code intelligence for AI agents and developers.

CodeGraph builds a semantic graph of your codebase — functions, classes, imports, call chains — and exposes it through 45 MCP tools, a VS Code extension, and a persistent memory layer. Parses 37 languages via tree-sitter. AI agents get structured code understanding instead of grepping through files.

Quick Start

MCP Server (Claude Code, Cursor, any MCP client)

Add to ~/.claude.json (or your MCP client config):

{
  "mcpServers": {
    "codegraph": {
      "command": "/path/to/codegraph-server",
      "args": ["--mcp"]
    }
  }
}

The server indexes the current working directory automatically.

VS Code Extension

Install the VSIX:

code --install-extension codegraph-0.14.0.vsix

The extension starts the server automatically and registers all tools as Language Model Tools for Copilot.

Rules for AI agents

Pre-configured rule files that teach AI coding agents (Claude, Cursor, Windsurf, Codex, Cline) to use CodeGraph MCP tools before falling back to grep / multi-file reads. Maps natural-language intent to the right codegraph_* tool.

→ codegraph-ai/codegraph-rules-for-agents

Setup is cp <agent>/codegraph.md ~/<agent>/ (one line per agent — see the rules repo's README).

GitHub Action — PR review in CI

Drop a workflow into your repo to get an automatic code-graph analysis comment on every PR — blast radius, test gaps, stale docs, suggested reviewers. Runs graph-only (no embeddings, no ONNX model), so it's fast and needs no API keys — just the built-in GITHUB_TOKEN.

Copy .github/workflows/codegraph-pr.yml into your repo. The core invocation is a single command:

codegraph-server --graph-only \
  --run-tool codegraph_pr_context \
  --tool-args '{"baseBranch":"main","format":"markdown"}'

This prints a ready-to-post markdown comment. The --graph-only flag skips embedding generation (10-50× faster indexing); --run-tool runs one tool and exits without the MCP stdio handshake — ideal for scripting.

---

Configuration

MCP Server flags

| Flag | Default | Description | |------|---------|-------------| | --workspace <path> | current dir | Directories to index (repeatable for multi-project) | | --exclude <dir> | — | Directories to skip (repeatable) | | --embedding-model <model> | bge-small | bge-small (384d, fast), jina-code-v2 (768d, 6× slower), or granite-97m (384d, 32K ctx, ~3× slower) | | --full-body-embedding | true | Embed full function body (~50 lines) for better semantic search and duplicate detection | | --max-files <n> | 5000 | Maximum files to index | | --profile <name> | all | Filter the exposed MCP tool surface to a named subset (see below) | | --graph-only | off | Skip embedding generation — build the graph and serve structural tools only. No ONNX model load, 10-50× faster indexing. Semantic search unavailable. For CI / one-shot graph queries. | | --run-tool <name> | — | One-shot mode: index, run a single tool, print its result, exit. No MCP handshake. Pair with --tool-args '<json>'. |

`--profile` — narrow the MCP tool surface

The full 32-tool surface is convenient but inflates the agent's prompt-context cost. A profile exposes only the slice you need (also settable via the CODEGRAPH_TOOL_PROFILE env var):

| Profile | Tools | Use when | |---------|-------|----------| | all (default) | every tool (community + pro) | normal sessions | | core | 8 — search + symbol info + AI context | chatty agent sessions where you only need lookups | | graph | 16 — callers/callees/deps/impact/traverse | refactoring + structural analysis | | memory | 7 — codegraph_memory_* only | note-taking / knowledge-base workflows | | security | pro security tools only (empty on community) | pro security audits |

VS Code settings

{
  "codegraph.indexOnStartup": true,
  "codegraph.indexPaths": ["/path/to/project-a", "/path/to/project-b"],
  "codegraph.excludePatterns": ["**/cmake-build-debug/**", "**/generated/**"],
  "codegraph.embeddingModel": "bge-small",
  "codegraph.maxFileSizeKB": 1024,
  "codegraph.debug": false
}

Full-body embeddings are enabled by default. Function body text is captured at parse time with zero I/O overhead.

Built-in exclusions (always skipped) cover ~47 directories across three categories:

Build / cache: node_modules, target, dist, build, out, .git, __pycache__, vendor, .venv, venv, .tox, .pytest_cache, .mypy_cache, .ruff_cache, .next, .nuxt, .svelte-kit, .parcel-cache, .npm, .yarn, .pnpm-store, .cache, .cargo, .bundle, .gradle, DerivedData, Pods, xcuserdata, cmake-build-*
IDE / IaC state: .idea, .vscode-test, .fleet, .terraform, .terragrunt-cache, .serverless
Sensitive credential dirs: .aws, .ssh, .gnupg, .kube, .docker

Plus glob patterns for binary archives, native libraries, OS metadata, and secret file extensions (.pem, .key, .p12, .pfx, .crt, .gpg, *.kdbx, SSH key conventions like id_rsa, etc.) — defense in depth against accidentally embedding credentials.

---

Tools (42 community + 27 pro, 17 security)

Code Analysis (11)

| Tool | What it does | |------|-------------| | get_ai_context | Primary context tool. Intent-aware (explain/modify/debug/test) with token budgeting. Returns source, related symbols, imports, siblings, debug hints. | | get_edit_context | Everything needed before editing: source + callers + tests + memories + git history | | get_curated_context | Cross-codebase context for a natural language query ("how does auth work?") | | analyze_impact | Blast radius prediction — what breaks if you modify, delete, or rename | | analyze_complexity | Cyclomatic complexity with breakdown (branches, loops, nesting, exceptions, early returns) | | find_circular_deps | Detect circular import/dependency chains across files | | find_hot_paths | Most-called functions ranked by transitive caller count | | find_dead_imports | Find unused imports — modules imported but never referenced | | get_module_summary | High-level summary of a directory: file count, functions, language breakdown, top complex functions | | search_by_pattern | Regex search across function bodies, signatures, names, and docstrings | | search_by_error | Find functions that throw, catch, or handle specific error types |

Code Navigation (13)

| Tool | What it does | |------|-------------| | symbol_search | Find symbols by name or natural language (hybrid BM25 + semantic search) | | get_callers / get_callees | Who calls this? What does it call? (with transitive depth) | | get_detailed_symbol | Full symbol info: source, callers, callees, complexity | | get_symbol_info | Quick metadata: signature, visibility, kind | | get_dependency_graph | File/module import relationships with depth control | | get_call_graph | Function call chains (callers and callees) | | find_by_imports | Find files importing a module | | find_by_signature | Search by param count, return type, modifiers | | find_entry_points | Main functions, HTTP handlers, CLI commands, event handlers | | find_implementors | Find all functions registered as ops struct callbacks | | find_related_tests | Tests that exercise a given function | | traverse_graph | Custom graph traversal with edge/node type filters |

Indexing (3)

| Tool | What it does | |------|-------------| | reindex_workspace | Full or incremental workspace reindex | | index_files | Add/update specific files without full reindex | | index_directory | Add directory to graph alongside existing data |

Memory (7)

Persistent AI context across sessions — debugging insights, architectural decisions, known issues.

| Tool | What it does | |------|-------------| | memory_store / memory_get / memory_search | Store, retrieve, search memories (BM25 + semantic) | | memory_context | Get memories relevant to a file/function | | memory_list / memory_invalidate / memory_stats | Browse, retire, monitor |

Pairs well with Tempera — an episodic memory system that captures transferable debugging strategies and solutions across projects. CodeGraph's memory tools store project-scoped notes; Tempera captures cross-project BKMs (best-known methods) that improve over time.

PR / Change Analysis (1)

| Tool | What it does | |------|-------------| | pr_context | One-call PR review. Runs git diff against base branch, finds changed functions in the graph, reports: blast radius (callers), test coverage + gaps, affected modules, diff-aware change classification (signature vs body), stale-doc warnings, complexity, commit-message hint, suggested reviewers from git blame. |

Documentation (7)

Persistent project documentation — index design docs, search them semantically, verify code matches the design, generate architecture docs from the code graph.

| Tool | What it does | |------|-------------| | index_markdown | Index a local .md file (ARCHITECTURE.md, API_DESIGN.md, etc.) into the persistent docs store. Heading-tree chunking with leaf-node embeddings. | | search_docs | Semantic search over indexed docs — returns matching sections with heading-path breadcrumbs | | list_doc_sources | List all indexed source files | | remove_doc_source | Remove all indexed chunks from a source file | | verify_design | Cross-reference doc claims vs code graph. direction=forward (doc→code), reverse (code→doc), or both | | design_gaps | Find identifiers described in docs that don't exist in code yet — build TODO lists from specs | | generate_architecture_doc | Auto-generate a structured ARCHITECTURE.md from the live code graph (modules, hot paths, complexity, circular deps) |

All tool names are prefixed with codegraph_ (e.g. codegraph_get_ai_context). Tools that target a specific symbol accept uri + line or nodeId from symbol_search results.

---

Usage examples

Index a design doc and search it: `` codegraph_index_markdown(path: "/projects/myapp/docs/ARCHITECTURE.md") codegraph_search_docs(query: "how does the auth module handle JWT refresh?") ``

Check if the code matches the design: `` codegraph_verify_design(source: "/projects/myapp/docs/ARCHITECTURE.md", direction: "forward") // → "132/132 identifiers verified, 0 gaps" ``

Find what's described in docs but not yet implemented: `` codegraph_design_gaps(source: "/projects/myapp/docs/API_DESIGN.md") // → "4 of 12 identifiers not found in code: PaymentService, RefundHandler, ..." ``

Generate architecture docs from the code graph: `` codegraph_generate_architecture_doc(scope: "src/", topN: 5) // → Markdown with modules, complexity hotspots, hot paths, circular deps ``

Save a debugging insight for future sessions: `` codegraph_memory_store(kind: "debug_context", title: "Nginx body size limit", content: "The /upload endpoint fails on payloads > 1MB...", problem: "API returns 500 on large uploads", solution: "Increase nginx client_max_body_size to 10M", agentSource: "claude") ``

Get AI context with graph compression stats + design doc augmentation: `` codegraph_get_ai_context(uri: "file:///projects/myapp/src/auth.rs", line: 42, intent: "modify") // → Code context + graphStats: {entitiesInGraph: 13555, entitiesTraversed: 47, entitiesKept: 8} // → design_context section from indexed docs mentioning "auth" ``

Review a PR — blast radius, test gaps, stale docs, reviewers in one call: `` codegraph_pr_context(baseBranch: "main") // → "PR changes 4 files (+263/-77, 12 functions). 37 direct callers, 8 tests, 3 untested. Risk: medium." // → test_gaps: [refresh_token, revoke_session] — functions with 0 test callers // → stale_docs: ["auth.rs described in ARCHITECTURE.md > Authentication — doc may need updating"] // → suggested_reviewers: [{author: "anvanster", lines_owned: 3200}] // → commit_hint: "feat(mcp): <describe the change>" ``

Narrow the tool surface for chatty sessions: ``bash codegraph-server --mcp --profile=core # Only 8 tools: search + symbol info + AI context ``

---

CodeGraph Pro

Additional tools available in CodeGraph Pro:

| Tool | What it does | |------|-------------| | scan_security | Security vulnerability scan: 40+ dangerous function patterns, source-to-sink taint tracing, auth coverage for HTTP endpoints (7 languages/frameworks), architectural layer violations, weak crypto, hardcoded secrets | | analyze_coupling | Module coupling metrics and instability scores | | find_unused_code | Dead code detection with confidence scoring | | find_duplicates | Detect duplicate/near-duplicate functions | | find_similar / cluster_symbols / compare_symbols | Embedding-based code similarity | | cross_project_search | Search across all indexed projects | | mine_git_history / mine_git_history_for_file / search_git_history | Git history mining and semantic search | | security_control_flow | Map every execution path through a function — "can this return without hitting the auth check?" | | security_trace_data_flow | Follow a variable from birth to death — "does user input reach this SQL query?" | | security_generate_sbom | CycloneDX SBOM from 8 lockfile formats | | security_audit_deps | OSV vulnerability check on dependencies | | security_check_unchecked_returns / _resource_leaks / _misconfig / _input_validation / _error_exposure | 5 heuristic analyzers covering ~80% of CWE Top 25 | | security_scan_iac | Docker / Kubernetes / Terraform misconfiguration scan | | security_check_licenses | Lockfile license policy enforcement (copyleft detection) | | security_check_secrets_entropy | Shannon-entropy hardcoded-secret detection | | security_detect_injection | Focused SQL/XSS/cmd/path/deser/template injection detection (20 patterns) | | security_check_search_path | Untrusted search-path / DLL-hijacking detection (CWE-426/CWE-427) | | security_check_crypto | Cryptographic misuse: weak ciphers/hashes/PRNG/keys, static IVs, timing-leak comparisons (CWE-208/326-330/338/916, 35 patterns) | | security_export_sarif | Aggregate findings as SARIF 2.1.0 (GitHub Code Scanning, GitLab SAST) |

*Cross-cutting features (all security_check_ tools):**

include_tests / treat_as_production — first-class skip for tests/samples/vendored
check_compile_gates — C/C++ findings inside #ifdef X are marked DEFENSIVE_GATED_OFF when X isn't defined by CMake/Cargo/Makefile
25-marker suppression honoring (# nosec, // NOLINT, // codeql[ignore], # rubocop:disable, etc.) at line and function level
Telemetry blocks per scan: path_filter (examined/matched/skipped) + compile_gate (gated_off count)

---

Languages

38 languages parsed via tree-sitter — functions, classes, imports, call graph, complexity metrics, dependency graphs, symbol search, and impact analysis:

| Category | Languages | |---|---| | Systems | C, C++, Rust, Zig, Objective-C | | JVM | Java, Kotlin, Scala, Groovy, Clojure | | Web/Scripting | TypeScript/JS, Python, Ruby, PHP, Perl, Lua, Elixir, Elm | | Web/Style | CSS | | Mobile | Swift, Dart | | Functional | Haskell, OCaml, Julia, Erlang, Elm, Clojure | | Enterprise | C#, COBOL, Fortran, Go | | Blockchain | Solidity | | Shell/Config | Bash, HCL/Terraform, TOML, YAML | | Hardware | Verilog/SystemVerilog, Tcl | | Data Science | R, Julia |

HTTP handler detection: Python (FastAPI/Flask/Django), TypeScript (NestJS), Java (Spring/JAX-RS), Go (stdlib/Gin/Echo/Fiber), C# (ASP.NET), Ruby (Rails), PHP (Laravel/Symfony).

---

Architecture

MCP Client (Claude, Cursor, ...)        VS Code Extension
        |                                       |
    MCP (stdio)                            LSP Protocol
        |                                       |
        └───────────┐               ┌───────────┘
                    ▼               ▼
            ┌─────────────────────────────┐
            │       codegraph-server      │
            ├─────────────────────────────┤
            │  38 tree-sitter parsers     │
            │  Semantic graph engine      │
            │  AI query engine (BM25)     │
            │  Memory layer (RocksDB)     │
            │  Docs store (RocksDB+HNSW)  │
            │  Full-body embeddings (BGE) │
            │  HNSW vector index          │
            └─────────────────────────────┘

A single Rust binary serves both MCP and LSP protocols.

Indexing: ~60 files/sec. Incremental re-indexing on file changes via FNV-1a content hashing.
Persistence: Graph and embeddings persist to ~/.codegraph/graph.db (RocksDB). Instant startup on restart — no re-parsing, no re-embedding.
Queries: Sub-100ms. Cross-file import and call resolution at index time.
Embeddings: Full-body (function bodies captured at parse time, zero disk I/O). Vectors stored in RocksDB alongside the graph. Auto-downloads model on first run.

---

Building from Source

git clone https://github.com/codegraph-ai/codegraph
cd codegraph
cargo build --release -p codegraph-server    # Rust server
cd vscode && npm install && npm run esbuild  # VS Code extension
npx @vscode/vsce package                     # VSIX

Requires Rust stable, Node.js 18+, VS Code 1.90+.

---

Support the project

CodeGraph is free, open-source, and maintained by a solo developer. If it saves you time, consider sponsoring on GitHub — it helps keep the project alive and growing.

---

License

Apache-2.0

CodeGraph

CodeGraph

Quick Start

MCP Server (Claude Code, Cursor, any MCP client)

VS Code Extension

Rules for AI agents

GitHub Action — PR review in CI

Configuration

MCP Server flags

`--profile` — narrow the MCP tool surface

VS Code settings

Tools (42 community + 27 pro, 17 security)

Code Analysis (11)

Code Navigation (13)

Indexing (3)

Memory (7)

PR / Change Analysis (1)

Documentation (7)

Usage examples

CodeGraph Pro

Languages

Architecture

Building from Source

Support the project

License

Related MCP servers

MCP servers by category

CodeGraph

CodeGraph

Quick Start

MCP Server (Claude Code, Cursor, any MCP client)

VS Code Extension

Rules for AI agents

GitHub Action — PR review in CI

Configuration

MCP Server flags

--profile — narrow the MCP tool surface

VS Code settings

Tools (42 community + 27 pro, 17 security)

Code Analysis (11)

Code Navigation (13)

Indexing (3)

Memory (7)

PR / Change Analysis (1)

Documentation (7)

Usage examples

CodeGraph Pro

Languages

Architecture

Building from Source

Support the project

License

Related MCP servers

MCP servers by category

`--profile` — narrow the MCP tool surface