GraphRAG MCP

Swissbit92/GraphRAG_MCP_Crypto
0 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

A local-first MCP server that transforms crypto whitepapers into a knowledge graph and vector corpus, enabling entity-filtered RAG question answering with optional knowledge graph enrichment.

README.md

๐Ÿง  GraphRAG MCP

Entity-centric Retrieval-Augmented Generation for Crypto Whitepapers _Local-first โ€ข Private โ€ข FastMCP-ready_

!GraphRAG MCP โ€“ Eeva AI Cyberpunk Header

<!-- Badges --> !Python !Ollama !ChromaDB !GraphDB !FastMCP !LangChain !Privacy

---

1๏ธโƒฃ โœจ Overview

GraphRAG MCP is a modular, local-first system that turns crypto whitepapers into an entity-centric Knowledge Graph and a vector-searchable corpus, then answers questions with RAG + optional KG enrichment + LLM synthesis โ€” all via standardized FastMCP tools.

Why this project?

  • ๐Ÿ›ก๏ธ Privacy by default: runs entirely on your machine (Ollama, Chroma, GraphDB).
  • โšก Fast & focused: entity-filtered retrieval narrows context to the right tokens/protocols.
  • ๐Ÿงฉ Composable: exposes rag. and kg. tools so an MCP Coordinator or Streamlit app can orchestrate multi-tool workflows.
  • ๐Ÿง  Explainable answers: returns citations with doc/chunk/entity IDs for every response.

---

๐Ÿ” Typical usage

  1. Ingest and label whitepapers โ†’ build embeddings and insert entities.
  2. Ask questions via rag.qa (semantic + entity-filtered retrieval), optionally enrich with KG labels/aliases.
  3. Get concise LLM answers with inline citations to source chunks.

---

2๏ธโƒฃ Features

๐Ÿงฉ Knowledge Graph (KG)

  • Entity-only architecture using RDF/OWL ontologies (mcp-core.ttl, mcp-crypto.ttl).
  • Built on Ontotext GraphDB 11+ with SHACL validation and SPARQL/GraphQL endpoints.
  • Stores canonical entities such as tokens, protocols, components, and organizations.
  • Enables KG enrichment for RAG answers via aliases, labels, and relationships.

๐Ÿ” Vector Retrieval (RAG)

  • ChromaDB acts as the persistent vector store for chunk embeddings.
  • Embeddings generated using Ollamaโ€™s nomic-embed-text model.
  • Supports semantic and entity-filtered retrieval modes for accurate context fetching.
  • Each chunk contains structured metadata: doc_id, chunk_id, entity_ids, section_type, and page.

๐Ÿง  Local LLM Inference

  • Uses Ollama for fully local inference โ€” no external API keys required.
  • Compatible with models like llama3.1:latest, qwen2.5:14b-instruct, or mistral.
  • Performs labeling, summarization, and final QA synthesis.
  • Includes deterministic mock mode for offline testing and CI.

โš™๏ธ FastMCP Servers

  • Two modular servers expose tools via FastMCP 2.x:
  • rag โ†’ rag.search, rag.embed_and_index, rag.reindex, rag.delete, rag.health, rag.qa
  • kg โ†’ sparql_query, sparql_update, push_labels, validate_labels, list_documents, kg.health
  • Both run locally via stdio and are MCP-Coordinator compatible.

๐Ÿ”’ Privacy & Portability

  • 100% offline operation โ€” suitable for air-gapped or research environments.
  • Reproducible local stack (GraphDB + Chroma + Ollama + FastMCP).
  • Works seamlessly on Windows 11, macOS, or Linux.

๐Ÿš€ Integration Ready

  • Plug-and-play with MCP Coordinators or Streamlit apps for end-user Q&A.
  • Can interoperate with other MCPs such as:
  • Brave API MCP (web search)
  • MongoDB MCP (strategy data)
  • Telegram MCP (messaging)
  • Gmail MCP (email retrieval)
  • Returns clean JSON outputs for easy chaining into agentic workflows.

---

3๏ธโƒฃ ๐Ÿ—๏ธ Architecture

The GraphRAG MCP architecture combines Knowledge Graph reasoning, Vector-based retrieval, and Local LLM synthesis โ€” all under the MCP interoperability standard. Itโ€™s designed for _clarity_, _privacy_, and _modular scalability_.

---

๐Ÿงญ High-Level Overview

| Layer | Technology | Purpose | Example Components | |:------|:------------|:---------|:--------------------| | ๐Ÿ—‚ Ingestion Layer | Python + LangChain | Reads PDFs, splits into semantic chunks, labels with LLMs | pdf_reader.py, semantic_splitter.py, llm_chunk_tagger.py | | ๐Ÿงฉ Knowledge Graph Layer (KG) | GraphDB (Ontotext) + RDFLib | Stores canonical entities (tokens, protocols, organizations) | graphdb_sink.py, namespaces.py, SHACL shapes | | ๐Ÿ’พ Vector Retrieval Layer (RAG) | ChromaDB + Ollama embeddings | Stores text chunks + metadata + embeddings for semantic retrieval | chroma_store.py, .chroma/ | | โš™๏ธ MCP Layer | FastMCP 2.x | Exposes standardized MCP tools (rag., kg.) | rag_server.py, kg_server.py | | ๐Ÿง  LLM Synthesis Layer | Ollama LLMs (llama3.1, qwen2.5) | Answers questions with retrieved context + KG enrichment | rag.qa, llm_chunk_tagger | | ๐Ÿ’ฌ User Interface Layer | MCP Coordinator / Streamlit | Connects multiple MCPs for conversational Q&A | Coordinator UI or custom Streamlit dashboard |

---

๐Ÿ”น Data Flow Diagram

            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ”‚              Whitepapers               โ”‚
            โ”‚ (PDFs, research papers, documentation) โ”‚
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ
            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ”‚      ๐Ÿ“„ Ingestion & Labeling                โ”‚
            โ”‚  pdf_reader โ†’ semantic_splitter โ†’           โ”‚
            โ”‚  llm_chunk_tagger โ†’ postprocess             โ”‚
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ”‚
                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ”‚                 โ”‚
                        โ–ผ                 โ–ผ
            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ”‚ ๐Ÿง  GraphDB KG  โ”‚    โ”‚ ๐Ÿ’พ Chroma RAG      โ”‚
            โ”‚ Entities & IRIs โ”‚   โ”‚ Chunks + Embeddings โ”‚
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚                     โ”‚
                     โ–ผ                     โ–ผ
               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
               โ”‚ โš™๏ธ kg_server  โ”‚     โ”‚ โš™๏ธ rag_server โ”‚
               โ”‚ (FastMCP)     โ”‚      โ”‚ (FastMCP)     โ”‚
               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚                    โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ–ผ
                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚ ๐Ÿ’ฌ MCP Coordinator / Streamlit โ”‚
                  โ”‚  User-facing Q&A Interface     โ”‚
                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

---

๐Ÿง  How It Works (Step-by-Step)

| Step | Description | Input | Output | |:----:|:-------------|:------|:--------| | 1๏ธโƒฃ | PDF Parsing | Whitepaper PDF | Raw text pages | | 2๏ธโƒฃ | Semantic Splitting | Raw text | Meaningful chunks (by section/topic) | | 3๏ธโƒฃ | LLM Labeling | Chunk text | Entities, relations, and section labels | | 4๏ธโƒฃ | Postprocessing | Labeled chunks | Cleaned JSONL with canonical entity IRIs | | 5๏ธโƒฃ | Indexing | JSONL labels | Chroma embeddings + KG triples | | 6๏ธโƒฃ | Retrieval (rag.search) | Query text / entities | Relevant chunks | | 7๏ธโƒฃ | Enrichment (optional) | Retrieved entities | KG aliases, definitions | | 8๏ธโƒฃ | Answer Synthesis (rag.qa) | Question + context | Concise answer with citations |

---

๐ŸŒ Data Modalities

| Data Type | Storage | Example | |:-----------|:--------|:--------| | ๐Ÿงฑ Entity | GraphDB | <https://kg.mcp.ai/id/token/bitcoin> โ†’ rdf:type crypto:Token | | ๐Ÿ“œ Chunk | Chroma | โ€œBitcoin is a peer-to-peer electronic cash systemโ€ฆโ€ | | ๐Ÿงฉ Embedding | Chroma / Ollama | 768-dim nomic-embed-text vector | | ๐Ÿงฎ Provenance | Metadata | doc_id, chunk_id, page, entity_ids[] | | ๐Ÿ’ฌ Answer | MCP JSON | { "answer": "...", "citations": [...] } |

---

๐Ÿงฑ Core MCP Tools

| Server | Tool | Description | |:--------|:------|:-------------| | ๐Ÿงฉ RAG | rag.search | Semantic search over chunks | | | rag.embed_and_index | Add new labeled chunks to index | | | rag.reindex | Rebuild from outputs directory | | | rag.delete | Delete by IDs or filters | | | rag.qa | Question answering with LLM synthesis | | | rag.health | Diagnostics and store info | | ๐Ÿง  KG | sparql_query / sparql_update | Execute SPARQL against GraphDB | | | push_labels / validate_labels | Add or validate KG entries | | | list_documents, get_chunk | Retrieve document metadata | | | kg.health | Check GraphDB repository status |

---

4๏ธโƒฃ โš™๏ธ Installation & Setup

Set up your local GraphRAG MCP environment in just a few steps! This stack runs fully offline and integrates seamlessly with Ollama, GraphDB, and Chroma.

---

๐Ÿงพ Prerequisites

| Requirement | Description | Example | |:-------------|:-------------|:----------| | ๐Ÿ Python | Version 3.11+ recommended | python --version โ†’ Python 3.11.8 | | ๐Ÿง  Ollama | Local LLM runtime (for inference + embeddings) | ollama pull llama3.1:latest | | ๐Ÿงฉ GraphDB Desktop 11+ | Local Knowledge Graph database | runs at http://localhost:7200 | | ๐Ÿ’พ ChromaDB | Vector store for embeddings | auto-initialized under .chroma/ | | ๐Ÿงฐ FastMCP | Multi-Component Platform runtime (2.x) | installed via pip |

---

๐Ÿงฑ Folder Layout (simplified)

| Folder | Purpose | Example Contents | |:--------|:----------|:----------------| | src/ | Core codebase | pipeline.py, mcp/, kg/, rag/ | | outputs/run_simple/ | Generated outputs | labeled chunks, reports, embeddings | | .chroma/ | Chroma persistent vector store | chroma.sqlite3, index/ | | .env | Environment configuration | Ollama, GraphDB, Chroma settings | | tests/ | Offline unit tests | test_rag_qa.py, test_kg_server.py |

---

๐Ÿงฐ Step-by-Step Setup

๐Ÿช„ 1๏ธโƒฃ Clone & Create Virtual Environment

git clone https://github.com/Swissbit92/GraphDB_Desktop.git

โšก 2๏ธโƒฃ Activate Environment

| OS | Command | |:---|:---------| | ๐ŸชŸ Windows (PowerShell) | .venv\Scripts\activate | | ๐Ÿง Linux / macOS | source .venv/bin/activate |

๐Ÿ“ฆ 3๏ธโƒฃ Install Dependencies

pip install -r requirements.txt

โš™๏ธ 4๏ธโƒฃ Verify Installation

python -m src.mcp.rag_server --list-tools
python -m src.mcp.kg_server --list-tools

โœ… You should see tools like rag.qa, rag.search, and kg.health.

---

๐Ÿง  Optional: Preload Ollama Models

| Model | Purpose | Pull Command | |:-------|:----------|:--------------| | ๐Ÿฆ™ llama3.1:latest | Default reasoning + summarization model | ollama pull llama3.1:latest | | ๐Ÿงฉ nomic-embed-text | Embedding model for RAG vectorization | ollama pull nomic-embed-text | | ๐Ÿค– qwen2.5:14b-instruct | Larger model for complex QA tasks | ollama pull qwen2.5:14b-instruct |

---

๐Ÿ” Quick Sanity Check

Run a quick health diagnostic to ensure everything is configured correctly:

pytest -q
python -m src.mcp.rag_server --run-tool rag.health
python -m src.mcp.kg_server --run-tool kg.health

If both return โœ… OK, youโ€™re ready to run the pipeline and start querying your Knowledge Graph + RAG system!

---

5๏ธโƒฃ ๐Ÿงช How to Use & Test

๐Ÿ“ฅ Ingest Whitepapers & Build the Index

# Place your PDFs under .\whitepapers\ then run:
python -m src.pipeline --input ".\whitepapers\*.pdf"

โœ… Outputs:

  • Labeled JSONL โ†’ outputs\run_simple\labels\
  • Chroma index โ†’ .chroma\
  • (If enabled) Entities pushed to GraphDB repository mcp_kg

---

๐Ÿ–ง Start the MCP Servers (RAG + KG)

# Terminal A
python -m src.mcp.rag_server
# Terminal B
python -m src.mcp.kg_server

๐Ÿ’ก Tip: In another PowerShell window, confirm the tools are available:

python -m src.mcp.rag_server --list-tools
python -m src.mcp.kg_server --list-tools

---

๐Ÿ”Ž Quick Retrieval Check (RAG)

# Example: semantic search for "peer-to-peer electronic cash"
python -m src.mcp.rag_server --run-tool rag.search --input '{ "text": "peer-to-peer electronic cash", "k": 3 }'

You should see matching chunks with doc_id, chunk_id, and distances.

---

โ“ Ask Questions with Citations (rag.qa)

# Fully offline (deterministic mock answer)
python -m src.mcp.rag_server --run-tool rag.qa --input '{ "question": "What problem does Bitcoin aim to solve?", "k": 5, "kg_enrich": true, "use_mock_llm": true }'

โžก๏ธ Returns:

  • answer: concise response (mock or LLM)
  • citations: [ {doc_id, chunk_id, entity_ids, text} ]
  • took_ms, model_used

Switch to real LLM synthesis by omitting use_mock_llm (requires Ollama running).

---

๐Ÿง  Optional: Entity-Filtered QA

python -m src.mcp.rag_server --run-tool rag.qa --input '{ "question": "How does proof-of-work secure the network?", "entity_ids": ["https://kg.mcp.ai/id/token/bitcoin"], "k": 5, "kg_enrich": true, "use_mock_llm": true }'

This restricts retrieval to chunks tagged with the specified KG entity(ies).

---

๐Ÿงช Run the Test Suite

pytest -q

Key tests (all offline):

  • tests\test_rag_qa.py: verifies retrieval normalization and mock LLM mode
  • tests\test_kg_server.py: checks KG connectivity (skips if GraphDB not running)

---

๐Ÿฉบ Health Checks

python -m src.mcp.rag_server --run-tool rag.health
python -m src.mcp.kg_server --run-tool kg.health

Expect collection info, document counts, and OK status.

---

๐Ÿงฉ MCP Coordinator / UI Hookup (Optional)

Ensure your mcp.json references the running servers:

{
  "mcpServers": {
    "rag": { "command": "python", "args": ["-m", "src.mcp.rag_server"] },
    "kg":  { "command": "python", "args": ["-m", "src.mcp.kg_server"] }
  }
}

Then connect via your MCP Coordinator or Streamlit app to interactively call rag.qa and kg.* tools.

---

๐Ÿ™ Closing Words

GraphRAG MCP is part of the broader Eeva AI ecosystem โ€” an open, modular framework for intelligent crypto research and strategy generation. This project wouldnโ€™t exist without the incredible open-source community that continues to push the boundaries of local AI and knowledge engineering.

If you find this useful:

  • โญ Star the repository to support ongoing development
  • ๐Ÿงฉ Contribute improvements or new MCP modules
  • ๐Ÿง  Explore integrations with other MCPs (Brave API, MongoDB, Telegram, etc.)
  • ๐Ÿ’ฌ Share feedback โ€” every suggestion helps make the system smarter, faster, and more reliable

---

_โ€œKnowledge is only powerful when itโ€™s connected.โ€_ โ€” __Eeva AI Research__

Thank you for being part of the open-source journey. ๐Ÿš€

---

Related MCP servers

Browse all โ†’