MCP Codebase Mentor

skainguyen1412/mcp-codebase
0 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

An MCP server that acts as an AI mentor for any codebase using dual-layer indexing, enabling codebase initialization, tutorial generation, and semantic search.

README.md

MCP Codebase Mentor

An MCP (Model Context Protocol) server that acts as an AI mentor for any codebase using dual-layer indexing.

Features

  • Universal language support - AI handles all programming languages
  • Complete file coverage - Indexes code, tests, configs, and docs
  • Smart filtering - Respects .gitignore and applies sensible defaults
  • Semantic search - Vector-based code search using LlamaIndex
  • Tutorial generation - Creates structured learning guides with architecture diagrams

Installation

# Clone the repository
git clone <repository-url>
cd mcp-codebase

# Install dependencies
npm install

# Build the project
npm run build

Usage with Cursor/Claude

Add to your MCP configuration:

{
  "mcpServers": {
    "codebase-mentor": {
      "command": "node",
      "args": ["/path/to/mcp-codebase/dist/index.js"]
    }
  }
}

Available Tools

init_codebase

Initialize and index a codebase for AI mentoring.

init_codebase(rootPath: "/path/to/your/project")

This will:

  1. Crawl the directory structure (respecting .gitignore)
  2. Analyze each file with AI to extract summaries, imports, and exports
  3. Build a manifest with file metadata and dependency graph
  4. Create a vector index for semantic search

Output files:

  • .mcp_manifest.json - File metadata and dependency graph
  • .mcp_index/ - Vector index for semantic search

generate_tutorial

Generate a comprehensive "Zero to Hero" tutorial for a codebase.

generate_tutorial(rootPath: "/path/to/your/project", focusTopic?: "authentication")

Creates:

  • Project overview and architecture
  • Mermaid.js dependency diagrams
  • Structured learning path (chapters)
  • Key insights and patterns

search_codebase

Perform semantic search across a codebase.

search_codebase(rootPath: "/path/to/your/project", query: "how is authentication handled?")

Returns relevant code snippets with:

  • File paths and line numbers
  • Relevance scores
  • File context and summaries

Project Structure

mcp-codebase/
├── src/
│   ├── index.ts                    # MCP server entry point
│   ├── tools/
│   │   ├── init.ts                 # init_codebase implementation
│   │   ├── tutorial.ts             # generate_tutorial implementation
│   │   └── search.ts               # search_codebase implementation
│   ├── core/
│   │   ├── crawler.ts              # File system walker (.gitignore aware)
│   │   ├── analyzer.ts             # LLM-based file analysis
│   │   ├── manifest.ts             # Manifest CRUD operations
│   │   └── vectorIndex.ts          # LlamaIndex integration
│   ├── utils/
│   │   ├── fileFilter.ts           # Smart file filtering logic
│   │   ├── languageDetect.ts       # Language/file type detection
│   │   ├── progress.ts             # Progress reporter
│   │   └── git.ts                  # Git metadata extraction
│   ├── prompts/
│   │   ├── analyze.ts              # Universal file analysis prompt
│   │   └── curriculum.ts           # Tutorial generation prompt
│   └── types/
│       ├── manifest.ts             # Manifest type definitions
│       └── mcp.ts                  # MCP tool interfaces
├── package.json
├── tsconfig.json
└── README.md

Development

# Type checking
npm run typecheck

# Development mode with auto-reload
npm run dev

# Build for production
npm run build

Performance Expectations

For a typical repository:

  • 500 files: ~10-15 minutes (mostly AI analysis)
  • 1000 files: ~20-30 minutes
  • 5000 files: ~2 hours

Initialization is a one-time operation. Subsequent queries use the cached index.

Storage

For a 500-file repository (~50MB source):

  • Manifest: ~100-200 KB
  • Vector Index: ~5-10 MB
  • Total overhead: ~20% of source size

Limitations

  1. LLM Dependency: Initialization requires an MCP host with sampling capability
  2. No Incremental Updates: Re-run init_codebase when files change significantly
  3. Binary Files: Skipped (images, PDFs, executables)
  4. Very Large Files: May hit LLM context limits (>100K tokens)

License

MIT

Related MCP servers

Browse all →