Knowledge Assistant MCP Server

SMGilliatt/knowledge-assistant-mcp
0 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

A multi-agent RAG MCP server that answers questions from your documents with a human-in-the-loop approval step, using a coordinator, retriever, and synthesizer agents.

README.md

Knowledge Assistant MCP Server

A multi-agent RAG (Retrieval-Augmented Generation) MCP server built with FastMCP in Python. It answers questions from your documents using a coordinator, retriever, and synthesizer agents, and includes a human-in-the-loop step where you approve or request edits before finalizing answers.

What it does

  • Query your knowledge base: Ask questions in natural language; the server retrieves relevant chunks and proposes an answer with citations.
  • Multi-agent pipeline: A coordinator decides whether to use the knowledge base, a retriever (RAG) fetches relevant documents, and a synthesizer produces a structured answer proposal.
  • Human-in-the-loop: You review the proposed answer and either approve it or request edits before the answer is finalized.
  • Add documents: Ingest text into the vector store (ChromaDB) so the assistant can answer from your own content.

Use cases: Internal knowledge assistant, FAQ over your docs, Q&A over notes or wikis, and similar RAG workflows that require a human approval step.

---

Project Structure

knowledge-assistant-mcp/
├── src/
│   ├── server.py           # FastMCP app entry point
│   ├── config/
│   │   └── settings.py     # pydantic-settings (server name, API keys, model, RAG settings)
│   ├── routers/
│   │   ├── tools.py        # Register MCP tools
│   │   ├── resources.py    # Register MCP resources
│   │   └── prompts.py      # Register MCP prompts
│   ├── tools/              # Tool implementations
│   ├── resources/          # Resource implementations
│   ├── prompts/            # Prompt content (workflow with human-in-the-loop)
│   ├── app/                # Core logic: RAG, LLM, orchestrator (coordinator/retriever/synthesizer)
│   ├── models/             # Pydantic schemas (structured outputs)
│   └── utils/              # Helpers (e.g. Opik)
├── pyproject.toml
├── .env.sample
├── Dockerfile
└── README.md

---

Setup

Prerequisites: Python 3.13, uv.

Clone the repository

git clone https://github.com/YOUR_USERNAME/knowledge-assistant-mcp.git
cd knowledge-assistant-mcp

Install dependencies with uv

uv sync

This creates a virtual environment (Python 3.13) and installs dependencies from pyproject.toml.

Configure environment variables

cp .env.sample .env

Edit .env and set at least:

  • GOOGLE_API_KEY (required): Used for Gemini (LLM and embeddings).

Get it from Google AI Studio.

Optional:

  • OPIK_API_KEY: For observability (tracing). Get it from Opik.
  • OPIK_PROJECT_NAME: Opik project name (default: knowledge-assistant).
  • MODEL_NAME: Gemini model (default: gemini-2.0-flash).
  • CHROMA_PERSIST_DIR: Directory for ChromaDB (default: ./chroma_data).
  • CHROMA_COLLECTION: Collection name (default: knowledge_base).
  • RAG_TOP_K: Number of chunks to retrieve (default: 5).
  • EMBEDDING_MODEL: Google embedding model for RAG (default: models/gemini-embedding-001). Override if your API uses a different model.

Run the server

Stdio (for Cursor / Claude Desktop):

uv run python -m src.server --transport stdio

HTTP:

uv run python -m src.server --transport http --port 8000

Or use the entry point:

uv run knowledge-assistant-mcp --transport stdio

You should see the FastMCP banner and the process waiting for connections; stop with Ctrl+C.

---

Environment variables

Variables you can set in .env, and where to get API keys:

Environment variables summary

| Variable | Required | Description | |---------------------|----------|-------------| | GOOGLE_API_KEY | Yes | Google AI (Gemini) API key – Google AI Studio | | OPIK_API_KEY | No | Opik API key for observability – Opik | | OPIK_PROJECT_NAME | No | Opik project name (default: knowledge-assistant) | | MODEL_NAME | No | Gemini model (default: gemini-2.0-flash) | | CHROMA_PERSIST_DIR| No | ChromaDB persistence directory (default: ./chroma_data) | | CHROMA_COLLECTION | No | ChromaDB collection name (default: knowledge_base) | | RAG_TOP_K | No | Number of chunks to retrieve (default: 5) | | EMBEDDING_MODEL | No | Google embedding model for RAG (default: models/gemini-embedding-001) |

---

Connecting from Cursor (or another MCP client)

Add this to your Cursor MCP settings (e.g. .cursor/mcp.json), replacing the path and API key as needed:

{
  "mcpServers": {
    "knowledge-assistant": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/knowledge-assistant-mcp",
        "run",
        "python",
        "-m",
        "src.server",
        "--transport",
        "stdio"
      ],
      "env": {
        "GOOGLE_API_KEY": "your-google-api-key-here"
      }
    }
  }
}

You can also rely on a .env file in the project directory and omit env or only set ENV_FILE_PATH if your client supports it.

---

How to use

Once the server is running and connected (e.g. in Cursor):

  1. Add documents (optional but needed for RAG answers)

Use the add_documents tool: pass text (the content to ingest) and optionally source (e.g. "Context Engineering Book"). The server chunks and embeds the text into ChromaDB. You can add more documents anytime.

  1. Ask a question

Use the query_knowledge_base tool with your question. The server runs the multi-agent pipeline (coordinator → retriever → synthesizer) and returns a proposed answer with citations.

  1. Human-in-the-loop

Review the proposal, then call approve_or_edit_answer:

  • To accept: approved=True, same proposal_answer as returned.
  • To request changes: approved=False, same proposal_answer, and set user_feedback to your requested edits. The server can then produce a revised answer.

You can also use search_knowledge_base to only search the vector store (no generated answer), and the knowledge_assistant_workflow prompt as a step-by-step guide. The resource knowledge-assistant://server_info exposes server metadata and RAG settings.

---

Features

Core:

FastMCP server (src/server.py) with tools (query_knowledge_base, approve_or_edit_answer, add_documents, search_knowledge_base), one workflow prompt (knowledge_assistant_workflow) with a human-in-the-loop step (review proposal → approve or edit via approve_or_edit_answer), uv-based setup, and the structure above. No API keys in the repo; .env.sample and .gitignore are included.

Additional:

  • Multi-agent orchestration – Coordinator, retriever (RAG), and synthesizer agents in src/app/orchestrator.py.
  • RAG with vector database – ChromaDB + LangChain + Google embeddings; search_knowledge_base and add_documents; persistence via CHROMA_PERSIST_DIR.
  • MCP resourceknowledge-assistant://server_info exposes server name, version, collection, and RAG settings.
  • Human-in-the-loop validation – Workflow returns a proposal; the user approves or requests edits with approve_or_edit_answer before finalizing.
  • Structured outputs – Pydantic models (AnswerProposal, SearchResult, RetrievedChunk, SynthesisResult) for synthesizer and API responses.
  • Observability (Opik) – Optional tracing when OPIK_API_KEY is set.

---

Docker

Build and run with Docker:

docker build -t knowledge-assistant-mcp .
docker run --rm -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport stdio

For HTTP on port 8000:

docker run --rm -p 8000:8000 -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport http --port 8000

---

License

MIT (or your chosen license).

Related MCP servers

Browse all →