Knowledge Assistant MCP Server
A multi-agent RAG (Retrieval-Augmented Generation) MCP server built with FastMCP in Python. It answers questions from your documents using a coordinator, retriever, and synthesizer agents, and includes a human-in-the-loop step where you approve or request edits before finalizing answers.
What it does
- Query your knowledge base: Ask questions in natural language; the server retrieves relevant chunks and proposes an answer with citations.
- Multi-agent pipeline: A coordinator decides whether to use the knowledge base, a retriever (RAG) fetches relevant documents, and a synthesizer produces a structured answer proposal.
- Human-in-the-loop: You review the proposed answer and either approve it or request edits before the answer is finalized.
- Add documents: Ingest text into the vector store (ChromaDB) so the assistant can answer from your own content.
Use cases: Internal knowledge assistant, FAQ over your docs, Q&A over notes or wikis, and similar RAG workflows that require a human approval step.
---
Project Structure
knowledge-assistant-mcp/
├── src/
│ ├── server.py # FastMCP app entry point
│ ├── config/
│ │ └── settings.py # pydantic-settings (server name, API keys, model, RAG settings)
│ ├── routers/
│ │ ├── tools.py # Register MCP tools
│ │ ├── resources.py # Register MCP resources
│ │ └── prompts.py # Register MCP prompts
│ ├── tools/ # Tool implementations
│ ├── resources/ # Resource implementations
│ ├── prompts/ # Prompt content (workflow with human-in-the-loop)
│ ├── app/ # Core logic: RAG, LLM, orchestrator (coordinator/retriever/synthesizer)
│ ├── models/ # Pydantic schemas (structured outputs)
│ └── utils/ # Helpers (e.g. Opik)
├── pyproject.toml
├── .env.sample
├── Dockerfile
└── README.md
---
Setup
Prerequisites: Python 3.13, uv.
Clone the repository
git clone https://github.com/YOUR_USERNAME/knowledge-assistant-mcp.git
cd knowledge-assistant-mcp
Install dependencies with uv
uv sync
This creates a virtual environment (Python 3.13) and installs dependencies from pyproject.toml.
Configure environment variables
cp .env.sample .env
Edit .env and set at least:
GOOGLE_API_KEY(required): Used for Gemini (LLM and embeddings).
Get it from Google AI Studio.
Optional:
OPIK_API_KEY: For observability (tracing). Get it from Opik.OPIK_PROJECT_NAME: Opik project name (default:knowledge-assistant).MODEL_NAME: Gemini model (default:gemini-2.0-flash).CHROMA_PERSIST_DIR: Directory for ChromaDB (default:./chroma_data).CHROMA_COLLECTION: Collection name (default:knowledge_base).RAG_TOP_K: Number of chunks to retrieve (default:5).EMBEDDING_MODEL: Google embedding model for RAG (default:models/gemini-embedding-001). Override if your API uses a different model.
Run the server
Stdio (for Cursor / Claude Desktop):
uv run python -m src.server --transport stdio
HTTP:
uv run python -m src.server --transport http --port 8000
Or use the entry point:
uv run knowledge-assistant-mcp --transport stdio
You should see the FastMCP banner and the process waiting for connections; stop with Ctrl+C.
---
Environment variables
Variables you can set in .env, and where to get API keys:
Environment variables summary
| Variable | Required | Description | |---------------------|----------|-------------| | GOOGLE_API_KEY | Yes | Google AI (Gemini) API key – Google AI Studio | | OPIK_API_KEY | No | Opik API key for observability – Opik | | OPIK_PROJECT_NAME | No | Opik project name (default: knowledge-assistant) | | MODEL_NAME | No | Gemini model (default: gemini-2.0-flash) | | CHROMA_PERSIST_DIR| No | ChromaDB persistence directory (default: ./chroma_data) | | CHROMA_COLLECTION | No | ChromaDB collection name (default: knowledge_base) | | RAG_TOP_K | No | Number of chunks to retrieve (default: 5) | | EMBEDDING_MODEL | No | Google embedding model for RAG (default: models/gemini-embedding-001) |
---
Connecting from Cursor (or another MCP client)
Add this to your Cursor MCP settings (e.g. .cursor/mcp.json), replacing the path and API key as needed:
{
"mcpServers": {
"knowledge-assistant": {
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/knowledge-assistant-mcp",
"run",
"python",
"-m",
"src.server",
"--transport",
"stdio"
],
"env": {
"GOOGLE_API_KEY": "your-google-api-key-here"
}
}
}
}
You can also rely on a .env file in the project directory and omit env or only set ENV_FILE_PATH if your client supports it.
---
How to use
Once the server is running and connected (e.g. in Cursor):
- Add documents (optional but needed for RAG answers)
Use the add_documents tool: pass text (the content to ingest) and optionally source (e.g. "Context Engineering Book"). The server chunks and embeds the text into ChromaDB. You can add more documents anytime.
- Ask a question
Use the query_knowledge_base tool with your question. The server runs the multi-agent pipeline (coordinator → retriever → synthesizer) and returns a proposed answer with citations.
- Human-in-the-loop
Review the proposal, then call approve_or_edit_answer:
- To accept:
approved=True, sameproposal_answeras returned. - To request changes:
approved=False, sameproposal_answer, and setuser_feedbackto your requested edits. The server can then produce a revised answer.
You can also use search_knowledge_base to only search the vector store (no generated answer), and the knowledge_assistant_workflow prompt as a step-by-step guide. The resource knowledge-assistant://server_info exposes server metadata and RAG settings.
---
Features
Core:
FastMCP server (src/server.py) with tools (query_knowledge_base, approve_or_edit_answer, add_documents, search_knowledge_base), one workflow prompt (knowledge_assistant_workflow) with a human-in-the-loop step (review proposal → approve or edit via approve_or_edit_answer), uv-based setup, and the structure above. No API keys in the repo; .env.sample and .gitignore are included.
Additional:
- Multi-agent orchestration – Coordinator, retriever (RAG), and synthesizer agents in
src/app/orchestrator.py. - RAG with vector database – ChromaDB + LangChain + Google embeddings;
search_knowledge_baseandadd_documents; persistence viaCHROMA_PERSIST_DIR. - MCP resource –
knowledge-assistant://server_infoexposes server name, version, collection, and RAG settings. - Human-in-the-loop validation – Workflow returns a proposal; the user approves or requests edits with
approve_or_edit_answerbefore finalizing. - Structured outputs – Pydantic models (
AnswerProposal,SearchResult,RetrievedChunk,SynthesisResult) for synthesizer and API responses. - Observability (Opik) – Optional tracing when
OPIK_API_KEYis set.
---
Docker
Build and run with Docker:
docker build -t knowledge-assistant-mcp .
docker run --rm -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport stdio
For HTTP on port 8000:
docker run --rm -p 8000:8000 -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport http --port 8000
---
License
MIT (or your chosen license).






