MCP-Powered AI Assistant (Local — LlamaIndex + Ollama)
Privacy-first document intelligence: all models run locally via Ollama — no data leaves your machine.
---
What is MCP?
Model Context Protocol (MCP) is an open standard (by Anthropic) that defines how AI models discover and invoke tools at runtime. Think of it as a "USB-C port for AI" — any MCP-compatible client (Claude Desktop, your own agent, etc.) can connect to any MCP server and immediately use its tools.
MCP vs. Advanced RAG — What's the Difference?
| Dimension | Advanced RAG | MCP | |-----------|-------------|-----| | Purpose | Improve retrieval accuracy | Standardise tool/capability exposure | | Core idea | Better chunking, re-ranking, hybrid search | JSON-RPC tool registry with discovery | | What the LLM gets | Retrieved context injected into prompt | A menu of callable functions with schemas | | Execution | Single pipeline (query → retrieve → generate) | Multi-step agent loop (plan → pick tool → call → observe → repeat) | | Tools | Retrieval only | Any function: retrieval, APIs, databases, code | | State | Stateless per query | Stateful agent sessions possible | | This project | RAG is one tool inside the MCP server | MCP wraps 8 RAG tools, discoverable at runtime |
In short: Advanced RAG makes retrieval smarter. MCP makes the entire AI system composable and interoperable.
---
Project Architecture
mcp_rag_assistant/
├── config.py ← Central config (LLM, embed, chunking, server)
├── rag_engine.py ← LlamaIndex: load docs → build index → query engine
├── main.py ← CLI entrypoint (serve / index / query / demo)
├── mcp_client.py ← Example client that calls server tools
│
├── mcp_server/
│ └── server.py ← HTTP JSON-RPC server exposing all tools
│
├── tools/
│ └── rag_tools.py ← 8 MCP tool implementations
│
├── utils/
│ └── logger.py ← Structured logging
│
├── my_data/ ← ⬅ DROP YOUR FILES HERE (PDF, DOCX, XLSX, CSV)
├── storage/ ← ChromaDB persistence (auto-created)
├── logs/ ← Log files (auto-created)
│
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md
Data Flow
User Query
│
▼
MCP Client (mcp_client.py or Claude Desktop or your agent)
│ JSON-RPC POST /mcp {"method": "tools/call", "params": {...}}
▼
MCP Server (mcp_server/server.py)
│ dispatches to matching tool function
▼
Tool Function (tools/rag_tools.py)
│ calls get_query_engine().query(...)
▼
LlamaIndex Query Engine (rag_engine.py)
│ embeds query with qwen3-embedding:0.6b via Ollama
▼
ChromaDB Vector Store
│ returns top-K similar chunks
▼
Ollama LLM (llama3 or mistral)
│ synthesises answer from retrieved context
▼
JSON response back through MCP → Client
---
Available MCP Tools
| Tool | Description | |------|-------------| | query_documents | General Q&A over all indexed documents | | list_indexed_files | Show files in my_data/ | | rebuild_index | Re-index after adding/removing files | | summarize_document | Summarise a specific file by name | | analyze_data | Plain-English data analysis (CSV/XLSX) | | generate_report | Generate summary / detailed / executive report | | compare_documents | Compare two documents on a given aspect | | extract_entities | Extract people, orgs, dates, numbers |
---
Prerequisites
- Python 3.10+
- Ollama running locally — ollama.com
- Models already pulled (you have these):
llama3:latestmistral:latestqwen3-embedding:0.6b
---
Setup
# 1. Clone / unzip the project
cd mcp_rag_assistant
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure (optional — defaults work out of the box)
cp .env.example .env
# Edit .env to change models, ports, chunk sizes etc.
# 5. Add your documents
# Copy PDFs, DOCX, XLSX, CSV files into:
# my_data/
# 6. Build the index
python main.py index
# 7. Start the MCP server
python main.py serve
---
Usage
Start the server
python main.py serve
# MCP server listening on http://0.0.0.0:8080
One-shot query (no server needed)
python main.py query "What are the key findings in the Q1 report?"
Run the demo client (server must be running)
# In a second terminal:
python main.py demo
Rebuild index after adding new files
python main.py index
# or via MCP tool:
# call rebuild_index tool from any client
Health check
GET http://localhost:8080/health
GET http://localhost:8080/tools
---
Switching Models
Edit config.py or your .env:
# Use mistral instead of llama3
LLM_MODEL=mistral:latest
# Use nomic-embed-text for embeddings
EMBED_MODEL=nomic-embed-text:latest
---
Tuning Chunk Size
In config.py or .env:
| Setting | Default | Notes | |---------|---------|-------| | CHUNK_SIZE | 256 | Tokens per chunk. Smaller = more precise retrieval | | CHUNK_OVERLAP | 25 | Overlap between chunks. Helps preserve context at boundaries | | SIMILARITY_TOP_K | 5 | Chunks retrieved per query | | RESPONSE_MODE | compact | compact \| tree_summarize \| refine |
---






