CodeGraph

codegraph-ai/CodeGraph
29 starsApache-2.0Community

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

Semantic code graph with 34 tools: callers, impact, complexity, AI context, memory. 38 languages.

README.md

CodeGraph

Cross-language code intelligence for AI agents and developers.

![License](LICENSE)

CodeGraph builds a semantic graph of your codebase — functions, classes, imports, call chains — and exposes it through 45 MCP tools, a VS Code extension, and a persistent memory layer. Parses 37 languages via tree-sitter. AI agents get structured code understanding instead of grepping through files.

Quick Start

MCP Server (Claude Code, Cursor, any MCP client)

Add to ~/.claude.json (or your MCP client config):

{
  "mcpServers": {
    "codegraph": {
      "command": "/path/to/codegraph-server",
      "args": ["--mcp"]
    }
  }
}

The server indexes the current working directory automatically.

VS Code Extension

Install the VSIX:

code --install-extension codegraph-0.14.0.vsix

The extension starts the server automatically and registers all tools as Language Model Tools for Copilot.

Rules for AI agents

Pre-configured rule files that teach AI coding agents (Claude, Cursor, Windsurf, Codex, Cline) to use CodeGraph MCP tools before falling back to grep / multi-file reads. Maps natural-language intent to the right codegraph_* tool.

codegraph-ai/codegraph-rules-for-agents

Setup is cp <agent>/codegraph.md ~/<agent>/ (one line per agent — see the rules repo's README).

GitHub Action — PR review in CI

Drop a workflow into your repo to get an automatic code-graph analysis comment on every PR — blast radius, test gaps, stale docs, suggested reviewers. Runs graph-only (no embeddings, no ONNX model), so it's fast and needs no API keys — just the built-in GITHUB_TOKEN.

Copy .github/workflows/codegraph-pr.yml into your repo. The core invocation is a single command:

codegraph-server --graph-only \
  --run-tool codegraph_pr_context \
  --tool-args '{"baseBranch":"main","format":"markdown"}'

This prints a ready-to-post markdown comment. The --graph-only flag skips embedding generation (10-50× faster indexing); --run-tool runs one tool and exits without the MCP stdio handshake — ideal for scripting.

---

Configuration

MCP Server flags

| Flag | Default | Description | |------|---------|-------------| | --workspace <path> | current dir | Directories to index (repeatable for multi-project) | | --exclude <dir> | — | Directories to skip (repeatable) | | --embedding-model <model> | bge-small | bge-small (384d, fast), jina-code-v2 (768d, 6× slower), or granite-97m (384d, 32K ctx, ~3× slower) | | --full-body-embedding | true | Embed full function body (~50 lines) for better semantic search and duplicate detection | | --max-files <n> | 5000 | Maximum files to index | | --profile <name> | all | Filter the exposed MCP tool surface to a named subset (see below) | | --graph-only | off | Skip embedding generation — build the graph and serve structural tools only. No ONNX model load, 10-50× faster indexing. Semantic search unavailable. For CI / one-shot graph queries. | | --run-tool <name> | — | One-shot mode: index, run a single tool, print its result, exit. No MCP handshake. Pair with --tool-args '<json>'. |

--profile — narrow the MCP tool surface

The full 32-tool surface is convenient but inflates the agent's prompt-context cost. A profile exposes only the slice you need (also settable via the CODEGRAPH_TOOL_PROFILE env var):

| Profile | Tools | Use when | |---------|-------|----------| | all (default) | every tool (community + pro) | normal sessions | | core | 8 — search + symbol info + AI context | chatty agent sessions where you only need lookups | | graph | 16 — callers/callees/deps/impact/traverse | refactoring + structural analysis | | memory | 7 — codegraph_memory_* only | note-taking / knowledge-base workflows | | security | pro security tools only (empty on community) | pro security audits |

VS Code settings

{
  "codegraph.indexOnStartup": true,
  "codegraph.indexPaths": ["/path/to/project-a", "/path/to/project-b"],
  "codegraph.excludePatterns": ["**/cmake-build-debug/**", "**/generated/**"],
  "codegraph.embeddingModel": "bge-small",
  "codegraph.maxFileSizeKB": 1024,
  "codegraph.debug": false
}

Full-body embeddings are enabled by default. Function body text is captured at parse time with zero I/O overhead.

Built-in exclusions (always skipped) cover ~47 directories across three categories:

  • Build / cache: node_modules, target, dist, build, out, .git, __pycache__, vendor, .venv, venv, .tox, .pytest_cache, .mypy_cache, .ruff_cache, .next, .nuxt, .svelte-kit, .parcel-cache, .npm, .yarn, .pnpm-store, .cache, .cargo, .bundle, .gradle, DerivedData, Pods, xcuserdata, cmake-build-*
  • IDE / IaC state: .idea, .vscode-test, .fleet, .terraform, .terragrunt-cache, .serverless
  • Sensitive credential dirs: .aws, .ssh, .gnupg, .kube, .docker

Plus glob patterns for binary archives, native libraries, OS metadata, and secret file extensions (.pem, .key, .p12, .pfx, .crt, .gpg, *.kdbx, SSH key conventions like id_rsa, etc.) — defense in depth against accidentally embedding credentials.

---

Tools (42 community + 27 pro, 17 security)

Code Analysis (11)

| Tool | What it does | |------|-------------| | get_ai_context | Primary context tool. Intent-aware (explain/modify/debug/test) with token budgeting. Returns source, related symbols, imports, siblings, debug hints. | | get_edit_context | Everything needed before editing: source + callers + tests + memories + git history | | get_curated_context | Cross-codebase context for a natural language query ("how does auth work?") | | analyze_impact | Blast radius prediction — what breaks if you modify, delete, or rename | | analyze_complexity | Cyclomatic complexity with breakdown (branches, loops, nesting, exceptions, early returns) | | find_circular_deps | Detect circular import/dependency chains across files | | find_hot_paths | Most-called functions ranked by transitive caller count | | find_dead_imports | Find unused imports — modules imported but never referenced | | get_module_summary | High-level summary of a directory: file count, functions, language breakdown, top complex functions | | search_by_pattern | Regex search across function bodies, signatures, names, and docstrings | | search_by_error | Find functions that throw, catch, or handle specific error types |

Code Navigation (13)

| Tool | What it does | |------|-------------| | symbol_search | Find symbols by name or natural language (hybrid BM25 + semantic search) | | get_callers / get_callees | Who calls this? What does it call? (with transitive depth) | | get_detailed_symbol | Full symbol info: source, callers, callees, complexity | | get_symbol_info | Quick metadata: signature, visibility, kind | | get_dependency_graph | File/module import relationships with depth control | | get_call_graph | Function call chains (callers and callees) | | find_by_imports | Find files importing a module | | find_by_signature | Search by param count, return type, modifiers | | find_entry_points | Main functions, HTTP handlers, CLI commands, event handlers | | find_implementors | Find all functions registered as ops struct callbacks | | find_related_tests | Tests that exercise a given function | | traverse_graph | Custom graph traversal with edge/node type filters |

Indexing (3)

| Tool | What it does | |------|-------------| | reindex_workspace | Full or incremental workspace reindex | | index_files | Add/update specific files without full reindex | | index_directory | Add directory to graph alongside existing data |

Memory (7)

Persistent AI context across sessions — debugging insights, architectural decisions, known issues.

| Tool | What it does | |------|-------------| | memory_store / memory_get / memory_search | Store, retrieve, search memories (BM25 + semantic) | | memory_context | Get memories relevant to a file/function | | memory_list / memory_invalidate / memory_stats | Browse, retire, monitor |

Pairs well with Tempera — an episodic memory system that captures transferable debugging strategies and solutions across projects. CodeGraph's memory tools store project-scoped notes; Tempera captures cross-project BKMs (best-known methods) that improve over time.

PR / Change Analysis (1)

| Tool | What it does | |------|-------------| | pr_context | One-call PR review. Runs git diff against base branch, finds changed functions in the graph, reports: blast radius (callers), test coverage + gaps, affected modules, diff-aware change classification (signature vs body), stale-doc warnings, complexity, commit-message hint, suggested reviewers from git blame. |

Documentation (7)

Persistent project documentation — index design docs, search them semantically, verify code matches the design, generate architecture docs from the code graph.

| Tool | What it does | |------|-------------| | index_markdown | Index a local .md file (ARCHITECTURE.md, API_DESIGN.md, etc.) into the persistent docs store. Heading-tree chunking with leaf-node embeddings. | | search_docs | Semantic search over indexed docs — returns matching sections with heading-path breadcrumbs | | list_doc_sources | List all indexed source files | | remove_doc_source | Remove all indexed chunks from a source file | | verify_design | Cross-reference doc claims vs code graph. direction=forward (doc→code), reverse (code→doc), or both | | design_gaps | Find identifiers described in docs that don't exist in code yet — build TODO lists from specs | | generate_architecture_doc | Auto-generate a structured ARCHITECTURE.md from the live code graph (modules, hot paths, complexity, circular deps) |

All tool names are prefixed with codegraph_ (e.g. codegraph_get_ai_context). Tools that target a specific symbol accept uri + line or nodeId from symbol_search results.

---

Usage examples

Index a design doc and search it: `` codegraph_index_markdown(path: "/projects/myapp/docs/ARCHITECTURE.md") codegraph_search_docs(query: "how does the auth module handle JWT refresh?") ``

Check if the code matches the design: `` codegraph_verify_design(source: "/projects/myapp/docs/ARCHITECTURE.md", direction: "forward") // → "132/132 identifiers verified, 0 gaps" ``

Find what's described in docs but not yet implemented: `` codegraph_design_gaps(source: "/projects/myapp/docs/API_DESIGN.md") // → "4 of 12 identifiers not found in code: PaymentService, RefundHandler, ..." ``

Generate architecture docs from the code graph: `` codegraph_generate_architecture_doc(scope: "src/", topN: 5) // → Markdown with modules, complexity hotspots, hot paths, circular deps ``

Save a debugging insight for future sessions: `` codegraph_memory_store(kind: "debug_context", title: "Nginx body size limit", content: "The /upload endpoint fails on payloads > 1MB...", problem: "API returns 500 on large uploads", solution: "Increase nginx client_max_body_size to 10M", agentSource: "claude") ``

Get AI context with graph compression stats + design doc augmentation: `` codegraph_get_ai_context(uri: "file:///projects/myapp/src/auth.rs", line: 42, intent: "modify") // → Code context + graphStats: {entitiesInGraph: 13555, entitiesTraversed: 47, entitiesKept: 8} // → design_context section from indexed docs mentioning "auth" ``

Review a PR — blast radius, test gaps, stale docs, reviewers in one call: `` codegraph_pr_context(baseBranch: "main") // → "PR changes 4 files (+263/-77, 12 functions). 37 direct callers, 8 tests, 3 untested. Risk: medium." // → test_gaps: [refresh_token, revoke_session] — functions with 0 test callers // → stale_docs: ["auth.rs described in ARCHITECTURE.md > Authentication — doc may need updating"] // → suggested_reviewers: [{author: "anvanster", lines_owned: 3200}] // → commit_hint: "feat(mcp): <describe the change>" ``

Narrow the tool surface for chatty sessions: ``bash codegraph-server --mcp --profile=core # Only 8 tools: search + symbol info + AI context ``

---

CodeGraph Pro

Additional tools available in CodeGraph Pro:

| Tool | What it does | |------|-------------| | scan_security | Security vulnerability scan: 40+ dangerous function patterns, source-to-sink taint tracing, auth coverage for HTTP endpoints (7 languages/frameworks), architectural layer violations, weak crypto, hardcoded secrets | | analyze_coupling | Module coupling metrics and instability scores | | find_unused_code | Dead code detection with confidence scoring | | find_duplicates | Detect duplicate/near-duplicate functions | | find_similar / cluster_symbols / compare_symbols | Embedding-based code similarity | | cross_project_search | Search across all indexed projects | | mine_git_history / mine_git_history_for_file / search_git_history | Git history mining and semantic search | | security_control_flow | Map every execution path through a function — "can this return without hitting the auth check?" | | security_trace_data_flow | Follow a variable from birth to death — "does user input reach this SQL query?" | | security_generate_sbom | CycloneDX SBOM from 8 lockfile formats | | security_audit_deps | OSV vulnerability check on dependencies | | security_check_unchecked_returns / _resource_leaks / _misconfig / _input_validation / _error_exposure | 5 heuristic analyzers covering ~80% of CWE Top 25 | | security_scan_iac | Docker / Kubernetes / Terraform misconfiguration scan | | security_check_licenses | Lockfile license policy enforcement (copyleft detection) | | security_check_secrets_entropy | Shannon-entropy hardcoded-secret detection | | security_detect_injection | Focused SQL/XSS/cmd/path/deser/template injection detection (20 patterns) | | security_check_search_path | Untrusted search-path / DLL-hijacking detection (CWE-426/CWE-427) | | security_check_crypto | Cryptographic misuse: weak ciphers/hashes/PRNG/keys, static IVs, timing-leak comparisons (CWE-208/326-330/338/916, 35 patterns) | | security_export_sarif | Aggregate findings as SARIF 2.1.0 (GitHub Code Scanning, GitLab SAST) |

*Cross-cutting features (all security_check_ tools):**

  • include_tests / treat_as_production — first-class skip for tests/samples/vendored
  • check_compile_gates — C/C++ findings inside #ifdef X are marked DEFENSIVE_GATED_OFF when X isn't defined by CMake/Cargo/Makefile
  • 25-marker suppression honoring (# nosec, // NOLINT, // codeql[ignore], # rubocop:disable, etc.) at line and function level
  • Telemetry blocks per scan: path_filter (examined/matched/skipped) + compile_gate (gated_off count)

---

Languages

38 languages parsed via tree-sitter — functions, classes, imports, call graph, complexity metrics, dependency graphs, symbol search, and impact analysis:

| Category | Languages | |---|---| | Systems | C, C++, Rust, Zig, Objective-C | | JVM | Java, Kotlin, Scala, Groovy, Clojure | | Web/Scripting | TypeScript/JS, Python, Ruby, PHP, Perl, Lua, Elixir, Elm | | Web/Style | CSS | | Mobile | Swift, Dart | | Functional | Haskell, OCaml, Julia, Erlang, Elm, Clojure | | Enterprise | C#, COBOL, Fortran, Go | | Blockchain | Solidity | | Shell/Config | Bash, HCL/Terraform, TOML, YAML | | Hardware | Verilog/SystemVerilog, Tcl | | Data Science | R, Julia |

HTTP handler detection: Python (FastAPI/Flask/Django), TypeScript (NestJS), Java (Spring/JAX-RS), Go (stdlib/Gin/Echo/Fiber), C# (ASP.NET), Ruby (Rails), PHP (Laravel/Symfony).

---

Architecture

MCP Client (Claude, Cursor, ...)        VS Code Extension
        |                                       |
    MCP (stdio)                            LSP Protocol
        |                                       |
        └───────────┐               ┌───────────┘
                    ▼               ▼
            ┌─────────────────────────────┐
            │       codegraph-server      │
            ├─────────────────────────────┤
            │  38 tree-sitter parsers     │
            │  Semantic graph engine      │
            │  AI query engine (BM25)     │
            │  Memory layer (RocksDB)     │
            │  Docs store (RocksDB+HNSW)  │
            │  Full-body embeddings (BGE) │
            │  HNSW vector index          │
            └─────────────────────────────┘

A single Rust binary serves both MCP and LSP protocols.

  • Indexing: ~60 files/sec. Incremental re-indexing on file changes via FNV-1a content hashing.
  • Persistence: Graph and embeddings persist to ~/.codegraph/graph.db (RocksDB). Instant startup on restart — no re-parsing, no re-embedding.
  • Queries: Sub-100ms. Cross-file import and call resolution at index time.
  • Embeddings: Full-body (function bodies captured at parse time, zero disk I/O). Vectors stored in RocksDB alongside the graph. Auto-downloads model on first run.

---

Building from Source

git clone https://github.com/codegraph-ai/codegraph
cd codegraph
cargo build --release -p codegraph-server    # Rust server
cd vscode && npm install && npm run esbuild  # VS Code extension
npx @vscode/vsce package                     # VSIX

Requires Rust stable, Node.js 18+, VS Code 1.90+.

---

Support the project

CodeGraph is free, open-source, and maintained by a solo developer. If it saves you time, consider sponsoring on GitHub — it helps keep the project alive and growing.

---

License

Apache-2.0

Related MCP servers

Browse all →