๐ง GraphRAG MCP
Entity-centric Retrieval-Augmented Generation for Crypto Whitepapers _Local-first โข Private โข FastMCP-ready_
!GraphRAG MCP โ Eeva AI Cyberpunk Header
<!-- Badges --> !Python !Ollama !ChromaDB !GraphDB !FastMCP !LangChain !Privacy
---
1๏ธโฃ โจ Overview
GraphRAG MCP is a modular, local-first system that turns crypto whitepapers into an entity-centric Knowledge Graph and a vector-searchable corpus, then answers questions with RAG + optional KG enrichment + LLM synthesis โ all via standardized FastMCP tools.
Why this project?
- ๐ก๏ธ Privacy by default: runs entirely on your machine (Ollama, Chroma, GraphDB).
- โก Fast & focused: entity-filtered retrieval narrows context to the right tokens/protocols.
- ๐งฉ Composable: exposes
rag.andkg.tools so an MCP Coordinator or Streamlit app can orchestrate multi-tool workflows. - ๐ง Explainable answers: returns citations with doc/chunk/entity IDs for every response.
---
๐ Typical usage
- Ingest and label whitepapers โ build embeddings and insert entities.
- Ask questions via
rag.qa(semantic + entity-filtered retrieval), optionally enrich with KG labels/aliases. - Get concise LLM answers with inline citations to source chunks.
---
2๏ธโฃ Features
๐งฉ Knowledge Graph (KG)
- Entity-only architecture using RDF/OWL ontologies (
mcp-core.ttl,mcp-crypto.ttl). - Built on Ontotext GraphDB 11+ with SHACL validation and SPARQL/GraphQL endpoints.
- Stores canonical entities such as tokens, protocols, components, and organizations.
- Enables KG enrichment for RAG answers via aliases, labels, and relationships.
๐ Vector Retrieval (RAG)
- ChromaDB acts as the persistent vector store for chunk embeddings.
- Embeddings generated using Ollamaโs
nomic-embed-textmodel. - Supports semantic and entity-filtered retrieval modes for accurate context fetching.
- Each chunk contains structured metadata:
doc_id,chunk_id,entity_ids,section_type, andpage.
๐ง Local LLM Inference
- Uses Ollama for fully local inference โ no external API keys required.
- Compatible with models like
llama3.1:latest,qwen2.5:14b-instruct, ormistral. - Performs labeling, summarization, and final QA synthesis.
- Includes deterministic mock mode for offline testing and CI.
โ๏ธ FastMCP Servers
- Two modular servers expose tools via FastMCP 2.x:
ragโrag.search,rag.embed_and_index,rag.reindex,rag.delete,rag.health,rag.qakgโsparql_query,sparql_update,push_labels,validate_labels,list_documents,kg.health- Both run locally via stdio and are MCP-Coordinator compatible.
๐ Privacy & Portability
- 100% offline operation โ suitable for air-gapped or research environments.
- Reproducible local stack (GraphDB + Chroma + Ollama + FastMCP).
- Works seamlessly on Windows 11, macOS, or Linux.
๐ Integration Ready
- Plug-and-play with MCP Coordinators or Streamlit apps for end-user Q&A.
- Can interoperate with other MCPs such as:
- Brave API MCP (web search)
- MongoDB MCP (strategy data)
- Telegram MCP (messaging)
- Gmail MCP (email retrieval)
- Returns clean JSON outputs for easy chaining into agentic workflows.
---
3๏ธโฃ ๐๏ธ Architecture
The GraphRAG MCP architecture combines Knowledge Graph reasoning, Vector-based retrieval, and Local LLM synthesis โ all under the MCP interoperability standard. Itโs designed for _clarity_, _privacy_, and _modular scalability_.
---
๐งญ High-Level Overview
| Layer | Technology | Purpose | Example Components | |:------|:------------|:---------|:--------------------| | ๐ Ingestion Layer | Python + LangChain | Reads PDFs, splits into semantic chunks, labels with LLMs | pdf_reader.py, semantic_splitter.py, llm_chunk_tagger.py | | ๐งฉ Knowledge Graph Layer (KG) | GraphDB (Ontotext) + RDFLib | Stores canonical entities (tokens, protocols, organizations) | graphdb_sink.py, namespaces.py, SHACL shapes | | ๐พ Vector Retrieval Layer (RAG) | ChromaDB + Ollama embeddings | Stores text chunks + metadata + embeddings for semantic retrieval | chroma_store.py, .chroma/ | | โ๏ธ MCP Layer | FastMCP 2.x | Exposes standardized MCP tools (rag., kg.) | rag_server.py, kg_server.py | | ๐ง LLM Synthesis Layer | Ollama LLMs (llama3.1, qwen2.5) | Answers questions with retrieved context + KG enrichment | rag.qa, llm_chunk_tagger | | ๐ฌ User Interface Layer | MCP Coordinator / Streamlit | Connects multiple MCPs for conversational Q&A | Coordinator UI or custom Streamlit dashboard |
---
๐น Data Flow Diagram
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Whitepapers โ
โ (PDFs, research papers, documentation) โ
โโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Ingestion & Labeling โ
โ pdf_reader โ semantic_splitter โ โ
โ llm_chunk_tagger โ postprocess โ
โโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโดโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ง GraphDB KG โ โ ๐พ Chroma RAG โ
โ Entities & IRIs โ โ Chunks + Embeddings โ
โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ โ๏ธ kg_server โ โ โ๏ธ rag_server โ
โ (FastMCP) โ โ (FastMCP) โ
โโโโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโโโ
โ โ
โโโโโโโโโโฌโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฌ MCP Coordinator / Streamlit โ
โ User-facing Q&A Interface โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
---
๐ง How It Works (Step-by-Step)
| Step | Description | Input | Output | |:----:|:-------------|:------|:--------| | 1๏ธโฃ | PDF Parsing | Whitepaper PDF | Raw text pages | | 2๏ธโฃ | Semantic Splitting | Raw text | Meaningful chunks (by section/topic) | | 3๏ธโฃ | LLM Labeling | Chunk text | Entities, relations, and section labels | | 4๏ธโฃ | Postprocessing | Labeled chunks | Cleaned JSONL with canonical entity IRIs | | 5๏ธโฃ | Indexing | JSONL labels | Chroma embeddings + KG triples | | 6๏ธโฃ | Retrieval (rag.search) | Query text / entities | Relevant chunks | | 7๏ธโฃ | Enrichment (optional) | Retrieved entities | KG aliases, definitions | | 8๏ธโฃ | Answer Synthesis (rag.qa) | Question + context | Concise answer with citations |
---
๐ Data Modalities
| Data Type | Storage | Example | |:-----------|:--------|:--------| | ๐งฑ Entity | GraphDB | <https://kg.mcp.ai/id/token/bitcoin> โ rdf:type crypto:Token | | ๐ Chunk | Chroma | โBitcoin is a peer-to-peer electronic cash systemโฆโ | | ๐งฉ Embedding | Chroma / Ollama | 768-dim nomic-embed-text vector | | ๐งฎ Provenance | Metadata | doc_id, chunk_id, page, entity_ids[] | | ๐ฌ Answer | MCP JSON | { "answer": "...", "citations": [...] } |
---
๐งฑ Core MCP Tools
| Server | Tool | Description | |:--------|:------|:-------------| | ๐งฉ RAG | rag.search | Semantic search over chunks | | | rag.embed_and_index | Add new labeled chunks to index | | | rag.reindex | Rebuild from outputs directory | | | rag.delete | Delete by IDs or filters | | | rag.qa | Question answering with LLM synthesis | | | rag.health | Diagnostics and store info | | ๐ง KG | sparql_query / sparql_update | Execute SPARQL against GraphDB | | | push_labels / validate_labels | Add or validate KG entries | | | list_documents, get_chunk | Retrieve document metadata | | | kg.health | Check GraphDB repository status |
---
4๏ธโฃ โ๏ธ Installation & Setup
Set up your local GraphRAG MCP environment in just a few steps! This stack runs fully offline and integrates seamlessly with Ollama, GraphDB, and Chroma.
---
๐งพ Prerequisites
| Requirement | Description | Example | |:-------------|:-------------|:----------| | ๐ Python | Version 3.11+ recommended | python --version โ Python 3.11.8 | | ๐ง Ollama | Local LLM runtime (for inference + embeddings) | ollama pull llama3.1:latest | | ๐งฉ GraphDB Desktop 11+ | Local Knowledge Graph database | runs at http://localhost:7200 | | ๐พ ChromaDB | Vector store for embeddings | auto-initialized under .chroma/ | | ๐งฐ FastMCP | Multi-Component Platform runtime (2.x) | installed via pip |
---
๐งฑ Folder Layout (simplified)
| Folder | Purpose | Example Contents | |:--------|:----------|:----------------| | src/ | Core codebase | pipeline.py, mcp/, kg/, rag/ | | outputs/run_simple/ | Generated outputs | labeled chunks, reports, embeddings | | .chroma/ | Chroma persistent vector store | chroma.sqlite3, index/ | | .env | Environment configuration | Ollama, GraphDB, Chroma settings | | tests/ | Offline unit tests | test_rag_qa.py, test_kg_server.py |
---
๐งฐ Step-by-Step Setup
๐ช 1๏ธโฃ Clone & Create Virtual Environment
git clone https://github.com/Swissbit92/GraphDB_Desktop.git
โก 2๏ธโฃ Activate Environment
| OS | Command | |:---|:---------| | ๐ช Windows (PowerShell) | .venv\Scripts\activate | | ๐ง Linux / macOS | source .venv/bin/activate |
๐ฆ 3๏ธโฃ Install Dependencies
pip install -r requirements.txt
โ๏ธ 4๏ธโฃ Verify Installation
python -m src.mcp.rag_server --list-tools
python -m src.mcp.kg_server --list-tools
โ
You should see tools like rag.qa, rag.search, and kg.health.
---
๐ง Optional: Preload Ollama Models
| Model | Purpose | Pull Command | |:-------|:----------|:--------------| | ๐ฆ llama3.1:latest | Default reasoning + summarization model | ollama pull llama3.1:latest | | ๐งฉ nomic-embed-text | Embedding model for RAG vectorization | ollama pull nomic-embed-text | | ๐ค qwen2.5:14b-instruct | Larger model for complex QA tasks | ollama pull qwen2.5:14b-instruct |
---
๐ Quick Sanity Check
Run a quick health diagnostic to ensure everything is configured correctly:
pytest -q
python -m src.mcp.rag_server --run-tool rag.health
python -m src.mcp.kg_server --run-tool kg.health
If both return โ OK, youโre ready to run the pipeline and start querying your Knowledge Graph + RAG system!
---
5๏ธโฃ ๐งช How to Use & Test
๐ฅ Ingest Whitepapers & Build the Index
# Place your PDFs under .\whitepapers\ then run:
python -m src.pipeline --input ".\whitepapers\*.pdf"
โ Outputs:
- Labeled JSONL โ
outputs\run_simple\labels\ - Chroma index โ
.chroma\ - (If enabled) Entities pushed to GraphDB repository
mcp_kg
---
๐ง Start the MCP Servers (RAG + KG)
# Terminal A
python -m src.mcp.rag_server
# Terminal B
python -m src.mcp.kg_server
๐ก Tip: In another PowerShell window, confirm the tools are available:
python -m src.mcp.rag_server --list-tools
python -m src.mcp.kg_server --list-tools
---
๐ Quick Retrieval Check (RAG)
# Example: semantic search for "peer-to-peer electronic cash"
python -m src.mcp.rag_server --run-tool rag.search --input '{ "text": "peer-to-peer electronic cash", "k": 3 }'
You should see matching chunks with doc_id, chunk_id, and distances.
---
โ Ask Questions with Citations (rag.qa)
# Fully offline (deterministic mock answer)
python -m src.mcp.rag_server --run-tool rag.qa --input '{ "question": "What problem does Bitcoin aim to solve?", "k": 5, "kg_enrich": true, "use_mock_llm": true }'
โก๏ธ Returns:
answer: concise response (mock or LLM)citations:[ {doc_id, chunk_id, entity_ids, text} ]took_ms,model_used
Switch to real LLM synthesis by omitting use_mock_llm (requires Ollama running).
---
๐ง Optional: Entity-Filtered QA
python -m src.mcp.rag_server --run-tool rag.qa --input '{ "question": "How does proof-of-work secure the network?", "entity_ids": ["https://kg.mcp.ai/id/token/bitcoin"], "k": 5, "kg_enrich": true, "use_mock_llm": true }'
This restricts retrieval to chunks tagged with the specified KG entity(ies).
---
๐งช Run the Test Suite
pytest -q
Key tests (all offline):
tests\test_rag_qa.py: verifies retrieval normalization and mock LLM modetests\test_kg_server.py: checks KG connectivity (skips if GraphDB not running)
---
๐ฉบ Health Checks
python -m src.mcp.rag_server --run-tool rag.health
python -m src.mcp.kg_server --run-tool kg.health
Expect collection info, document counts, and OK status.
---
๐งฉ MCP Coordinator / UI Hookup (Optional)
Ensure your mcp.json references the running servers:
{
"mcpServers": {
"rag": { "command": "python", "args": ["-m", "src.mcp.rag_server"] },
"kg": { "command": "python", "args": ["-m", "src.mcp.kg_server"] }
}
}
Then connect via your MCP Coordinator or Streamlit app to interactively call rag.qa and kg.* tools.
---
๐ Closing Words
GraphRAG MCP is part of the broader Eeva AI ecosystem โ an open, modular framework for intelligent crypto research and strategy generation. This project wouldnโt exist without the incredible open-source community that continues to push the boundaries of local AI and knowledge engineering.
If you find this useful:
- โญ Star the repository to support ongoing development
- ๐งฉ Contribute improvements or new MCP modules
- ๐ง Explore integrations with other MCPs (Brave API, MongoDB, Telegram, etc.)
- ๐ฌ Share feedback โ every suggestion helps make the system smarter, faster, and more reliable
---
_โKnowledge is only powerful when itโs connected.โ_ โ __Eeva AI Research__
Thank you for being part of the open-source journey. ๐
---






