MCP Codebase Mentor
An MCP (Model Context Protocol) server that acts as an AI mentor for any codebase using dual-layer indexing.
Features
- Universal language support - AI handles all programming languages
- Complete file coverage - Indexes code, tests, configs, and docs
- Smart filtering - Respects
.gitignoreand applies sensible defaults - Semantic search - Vector-based code search using LlamaIndex
- Tutorial generation - Creates structured learning guides with architecture diagrams
Installation
# Clone the repository
git clone <repository-url>
cd mcp-codebase
# Install dependencies
npm install
# Build the project
npm run build
Usage with Cursor/Claude
Add to your MCP configuration:
{
"mcpServers": {
"codebase-mentor": {
"command": "node",
"args": ["/path/to/mcp-codebase/dist/index.js"]
}
}
}
Available Tools
init_codebase
Initialize and index a codebase for AI mentoring.
init_codebase(rootPath: "/path/to/your/project")
This will:
- Crawl the directory structure (respecting
.gitignore) - Analyze each file with AI to extract summaries, imports, and exports
- Build a manifest with file metadata and dependency graph
- Create a vector index for semantic search
Output files:
.mcp_manifest.json- File metadata and dependency graph.mcp_index/- Vector index for semantic search
generate_tutorial
Generate a comprehensive "Zero to Hero" tutorial for a codebase.
generate_tutorial(rootPath: "/path/to/your/project", focusTopic?: "authentication")
Creates:
- Project overview and architecture
- Mermaid.js dependency diagrams
- Structured learning path (chapters)
- Key insights and patterns
search_codebase
Perform semantic search across a codebase.
search_codebase(rootPath: "/path/to/your/project", query: "how is authentication handled?")
Returns relevant code snippets with:
- File paths and line numbers
- Relevance scores
- File context and summaries
Project Structure
mcp-codebase/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── tools/
│ │ ├── init.ts # init_codebase implementation
│ │ ├── tutorial.ts # generate_tutorial implementation
│ │ └── search.ts # search_codebase implementation
│ ├── core/
│ │ ├── crawler.ts # File system walker (.gitignore aware)
│ │ ├── analyzer.ts # LLM-based file analysis
│ │ ├── manifest.ts # Manifest CRUD operations
│ │ └── vectorIndex.ts # LlamaIndex integration
│ ├── utils/
│ │ ├── fileFilter.ts # Smart file filtering logic
│ │ ├── languageDetect.ts # Language/file type detection
│ │ ├── progress.ts # Progress reporter
│ │ └── git.ts # Git metadata extraction
│ ├── prompts/
│ │ ├── analyze.ts # Universal file analysis prompt
│ │ └── curriculum.ts # Tutorial generation prompt
│ └── types/
│ ├── manifest.ts # Manifest type definitions
│ └── mcp.ts # MCP tool interfaces
├── package.json
├── tsconfig.json
└── README.md
Development
# Type checking
npm run typecheck
# Development mode with auto-reload
npm run dev
# Build for production
npm run build
Performance Expectations
For a typical repository:
- 500 files: ~10-15 minutes (mostly AI analysis)
- 1000 files: ~20-30 minutes
- 5000 files: ~2 hours
Initialization is a one-time operation. Subsequent queries use the cached index.
Storage
For a 500-file repository (~50MB source):
- Manifest: ~100-200 KB
- Vector Index: ~5-10 MB
- Total overhead: ~20% of source size
Limitations
- LLM Dependency: Initialization requires an MCP host with sampling capability
- No Incremental Updates: Re-run
init_codebasewhen files change significantly - Binary Files: Skipped (images, PDFs, executables)
- Very Large Files: May hit LLM context limits (>100K tokens)
License
MIT






