MCP PDF Server

BhagyeshRashinkar/mcp-pdf-server
1 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

Enables AI assistants to query PDF documents by ingesting them into a vector database and generating answers grounded in the actual documents.

README.md

πŸ“š MCP PDF Server

An MCP (Model Context Protocol) server that lets AI assistants query your PDF documents. Drop your PDFs, ingest them into a vector database, and ask questions β€” answers are grounded in your actual documents.

---

✨ Features

  • πŸ”Œ MCP-Compatible β€” Works with any MCP client (GitHub Copilot, Antigravity, etc.)
  • πŸ“„ Auto PDF Discovery β€” Automatically finds, extracts, chunks, and embeds all PDFs in your folder
  • πŸ” Vector Search β€” Retrieves the most relevant passages before generating answers
  • 🐳 Docker-Ready β€” Runs as a containerized server with one command
  • πŸ—„οΈ Qdrant β€” Fast, open-source vector database for similarity search

---

πŸ—οΈ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     MCP (stdio)     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     HTTP      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ AI Assistant │◄───────────────────►│  MCP PDF Server   │◄────────────►│  Qdrant  β”‚
β”‚              β”‚                     β”‚                   β”‚              β”‚ Vector DBβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚  1. Embed question β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚  2. Search vectors β”‚
                                    β”‚  3. Generate answerβ”‚   LLM API
                                    β”‚                    │◄────────────►
                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   (Embeddings
                                                            + Generation)
  1. You ask a question via your AI assistant.
  2. The server embeds the question using your choice of embedding model.
  3. It searches Qdrant for the top 5 most relevant text chunks from your PDFs.
  4. It generates an answer using an LLM, grounded in the retrieved context.

---

πŸš€ Quick Start

Prerequisites

1. Clone & Configure

git clone https://github.com/your-username/mcp-pdf-server.git
cd mcp-pdf-server

cp .env.example .env

Edit .env and set your API key:

API_KEY=nvapi-your_key_here

2. Start Qdrant

docker-compose up -d

3. Add Your PDFs & Ingest

Place your PDF documents in the pdfs/ folder, then:

npm install        # first time only
npm run ingest

All PDFs in the folder are automatically discovered and ingested.

4. Build the Server Image

docker build -t mcp-pdf-server .

5. Connect to Your AI Assistant

Add to your AI assistant's MCP config (e.g., mcp_config.json):

{
  "mcpServers": {
    "pdf-docs": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "--network",
        "mcp-network",
        "-e",
        "API_KEY",
        "-e",
        "QDRANT_URL=http://mcp-qdrant:6333",
        "-e",
        "COLLECTION_NAME=documents",
        "-e",
        "EMBED_MODEL=nvidia/nv-embedqa-e5-v5",
        "-e",
        "GEN_MODEL=qwen/qwen2.5-coder-32b-instruct",
        "mcp-pdf-server"
      ],
      "env": {
        "API_KEY": "your_nvapi_key_here"
      }
    }
  }
}

Done! Ask your AI assistant any question about your documents.

---

πŸ”§ Available Tools

| Tool | Description | | --------------- | -------------------------------------------------------------------------------------------------------- | | ask_documents | Ask any question. The server retrieves relevant context from your ingested PDFs and generates an answer. |

---

βš™οΈ Environment Variables

| Variable | Description | Default | | ------------------- | ----------------------------- | --------------------------------- | | API_KEY | LLM API key | _(required)_ | | EMBED_MODEL | Embedding model | nvidia/nv-embedqa-e5-v5 | | GEN_MODEL | Generation model | qwen/qwen2.5-coder-32b-instruct | | COLLECTION_NAME | Qdrant collection name | documents | | QDRANT_URL | Qdrant connection URL | http://localhost:6333 | | EMBED_BATCH_SIZE | Chunks per embedding batch | 15 | | EMBED_MAX_RETRIES | Max retries on API failure | 3 | | EMBED_COOLOFF_MS | Cooldown between batches (ms) | 500 |

Note: The .env file is used for local ingestion. The mcp_config.json passes env vars via Docker -e flags for the server.

---

πŸ“ Project Structure

mcp-pdf-server/
β”œβ”€β”€ pdfs/                     # Place your PDF documents here
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ server.ts             # MCP server entry point
β”‚   β”œβ”€β”€ llm/
β”‚   β”‚   └── provider.ts       # LLM API client (embed + generate)
β”‚   β”œβ”€β”€ vector/
β”‚   β”‚   └── qdrant.ts         # Qdrant client config
β”‚   └── ingest/
β”‚       β”œβ”€β”€ main.ts           # Ingestion orchestrator (auto-discovers PDFs)
β”‚       β”œβ”€β”€ extract.ts        # PDF text extraction
β”‚       β”œβ”€β”€ chunk.ts          # Text chunking
β”‚       └── embed.ts          # Batch embedding & Qdrant insertion
β”œβ”€β”€ docker-compose.yml        # Qdrant service
β”œβ”€β”€ Dockerfile                # Server image
β”œβ”€β”€ .env.example              # Env var template (safe to commit)
β”œβ”€β”€ .gitignore                # Keeps secrets & binaries out of git
└── package.json

---

πŸ› οΈ Development

For local development with hot-reloading:

npm install
docker-compose up -d    # Start Qdrant
npm run dev             # Server with hot-reload

To use the local dev server with your AI assistant, change mcp_config.json to:

{
  "mcpServers": {
    "pdf-docs": {
      "command": "npx",
      "args": ["tsx", "src/server.ts"],
      "cwd": "/path/to/mcp-pdf-server",
      "env": {
        "API_KEY": "your_nvapi_key_here",
        "QDRANT_URL": "http://localhost:6333",
        "COLLECTION_NAME": "documents",
        "EMBED_MODEL": "nvidia/nv-embedqa-e5-v5",
        "GEN_MODEL": "qwen/qwen2.5-coder-32b-instruct"
      }
    }
  }
}

---

πŸ“ Use Cases

This server works with any PDF knowledge base:

  • πŸ“– Technical books β€” Architecture, algorithms, system design
  • πŸ“‹ Company docs β€” Wikis, runbooks, policies
  • πŸ“„ Research papers β€” Academic papers, whitepapers
  • πŸ“‘ Legal documents β€” Contracts, compliance
  • πŸŽ“ Course material β€” Textbooks, lecture notes

---

πŸ› Troubleshooting

| Problem | Solution | | ------------------------- | -------------------------------------------------------------------- | | Server can't reach Qdrant | docker ps β€” Ensure mcp-qdrant is running on mcp-network | | Embeddings mismatch | Changed EMBED_MODEL? Delete qdrant_storage/ and re-ingest | | Rebuild server image | docker build -t mcp-pdf-server . after code changes | | Reset all data | Delete ./qdrant_storage/ and re-run npm run ingest | | Rate limiting | Increase EMBED_COOLOFF_MS or decrease EMBED_BATCH_SIZE in .env | | No PDFs found | Ensure .pdf files are placed in the pdfs/ directory |

---

πŸ“„ License

ISC

Related MCP servers

Browse all β†’