Paper Intelligence

![Python](https://python.org) ![MCP](https://modelcontextprotocol.io) ![License](LICENSE) ![uv](https://github.com/astral-sh/uv) ![ChromaDB](https://www.trychroma.com/) ![LlamaIndex](https://www.llamaindex.ai/)

A local MCP server for intelligent paper/PDF management. Convert PDFs to markdown, then search them with hybrid grep + semantic search. Designed for token efficiency: search first, read only what you need.

🚀 Quick Start

1. Install UV (one-time setup)

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Add to Your MCP Client

Claude Code CLI: ``bash claude mcp add paper-intelligence -- uvx paper-intelligence@latest ``

VS Code: ``bash code --add-mcp '{"name":"paper-intelligence","command":"uvx","args":["paper-intelligence@latest"]}' ``

That's it! uvx handles everything automatically. Using @latest ensures you always get the newest version.

🔌 MCP Client Integration

<details> <summary>Claude Desktop</summary>

Add to your Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence@latest"]
    }
  }
}

</details>

<details> <summary>Cursor</summary>

Go to Settings → MCP → Add new MCP Server
Select command type
Enter: uvx paper-intelligence@latest

Or add to ~/.cursor/mcp.json: ``json { "mcpServers": { "paper-intelligence": { "command": "uvx", "args": ["paper-intelligence@latest"] } } } ``

</details>

<details> <summary>Windsurf / Other MCP Clients</summary>

Any MCP-compatible client can use paper-intelligence:

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence@latest"]
    }
  }
}

</details>

✨ Features

PDF to Markdown — High-accuracy conversion using Marker
Hybrid Search — Combined grep (exact/regex) + semantic RAG search
Token Efficient — Search papers instead of reading entire documents
GPU Acceleration — MPS (Apple Silicon) and CUDA support
Self-Contained — Each paper gets its own directory with all data
Header Context — Search results show document structure (e.g., "Methods > Data Collection")

📖 MCP Tools

`process_paper`

<details> <summary>Full pipeline: Convert PDF → Index headers → Create embeddings</summary>

Parameters:

pdf_path (string): Path to PDF file
use_llm (boolean, optional): Enhanced accuracy mode (default: false)
chunk_size (integer, optional): Text chunk size for embedding (default: 512)
chunk_overlap (integer, optional): Overlap between chunks (default: 50)

Example: `` Process the paper at ~/Downloads/attention-is-all-you-need.pdf ``

Output Structure: `` attention-is-all-you-need/ ├── attention-is-all-you-need.md # Converted markdown ├── metadata.json # Processing version info ├── index.json # Header hierarchy ├── chroma/ # Embeddings database └── images/ # Extracted figures ``

</details>

`search`

<details> <summary>Unified search with grep and/or semantic RAG</summary>

Parameters:

query (string): Search query (text, regex, or semantic)
paper_dirs (array): List of paper directories to search
mode (string, optional): "grep", "rag", or "hybrid" (default: hybrid)
top_k (integer, optional): Number of results (default: 5)
regex (boolean, optional): Treat query as regex (default: false)

Example Queries: ```

Semantic search across papers

Search for "attention mechanism implementation" in my processed papers

Exact text search

Search for "transformer" using grep mode

Regex search

Search for "BERT|GPT|T5" with regex enabled ```

Returns: Results with line numbers, surrounding context, and header location.

</details>

`convert_pdf`

<details> <summary>Convert PDF to Markdown (without embeddings)</summary>

Parameters:

pdf_path (string): Path to PDF file
output_dir (string, optional): Custom output directory
use_llm (boolean, optional): Enhanced accuracy mode

Returns: markdown_path, images_dir, image_count

</details>

`get_paper_info`

<details> <summary>Check processing status of a paper</summary>

Parameters:

paper_dir (string): Path to paper directory

Example Response: ``json { "has_markdown": true, "has_index": true, "has_embeddings": true, "has_images": true, "image_count": 12, "version": "0.2.0", "processed_at": "2025-01-15T10:30:00Z" } ``

</details>

`index_markdown` / `embed_document`

<details> <summary>Individual pipeline steps (for advanced use)</summary>

index_markdown — Extract header hierarchy into searchable JSON ``python index_markdown(markdown_path="~/Downloads/paper/paper.md") ``

embed_document — Create embeddings for semantic search ``python embed_document( markdown_path="~/Downloads/paper/paper.md", chunk_size=512, chunk_overlap=50 ) ``

</details>

📊 Example Output

Search Result

{
  "source": "attention-is-all-you-need.md",
  "line_number": 142,
  "header_path": "Model Architecture > Attention",
  "content": "An attention function can be described as mapping a query and a set of key-value pairs to an output...",
  "score": 0.89
}

🎯 Typical Workflow

Process a paper:

Process the PDF at ~/Downloads/transformer-paper.pdf

Search across papers:

Search for "positional encoding" in my papers

Read specific sections:

Show me the Methods section from the transformer paper

The agent reads search results (a few hundred tokens) instead of entire papers (tens of thousands of tokens).

🛠️ Installation Options

<details> <summary>Install from PyPI</summary>

# Install with pip
pip install paper-intelligence

# Or run directly with uvx (no install needed)
uvx paper-intelligence@latest

</details>

<details> <summary>Install from GitHub</summary>

pip install "paper-intelligence @ git+https://github.com/Strand-AI/paper-intelligence.git"

</details>

<details> <summary>Local Development</summary>

git clone https://github.com/Strand-AI/paper-intelligence.git
cd paper-intelligence

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install in development mode
pip install -e ".[dev]"

# Run the server
python -m paper_intelligence.server

Development MCP config: ``json { "mcpServers": { "paper-intelligence": { "command": "python", "args": ["-m", "paper_intelligence.server"], "cwd": "/path/to/paper-intelligence" } } } ``

Run tests: ```bash

Unit tests (fast)

pytest tests/test_markdown_parser.py

Integration tests (slow, requires ML models)

pytest tests/test_integration.py -v ```

</details>

🔧 Debugging

Use the MCP Inspector to debug the server:

npx @modelcontextprotocol/inspector uvx paper-intelligence@latest

🆘 Troubleshooting

<details> <summary>Server not starting?</summary>

Ensure Python 3.11+ is installed
Try uvx paper-intelligence@latest directly to see error messages
Check that all dependencies installed correctly

</details>

<details> <summary>Windows encoding issues?</summary>

Add to your MCP config: ``json "env": { "PYTHONIOENCODING": "utf-8" } ``

</details>

<details> <summary>Claude Desktop not detecting changes?</summary>

Claude Desktop only reads configuration on startup. Fully restart the app after config changes.

</details>

🏗️ Technical Stack

| Component | Technology | |-----------|------------| | MCP Server | Official Python SDK with FastMCP | | PDF Conversion | marker-pdf | | Embeddings | LlamaIndex + HuggingFace (BAAI/bge-small-en-v1.5) | | Vector Store | ChromaDB (persistent, local per-paper) | | GPU Support | PyTorch with MPS (Apple) or CUDA |

🙏 Acknowledgments

Marker for excellent PDF conversion
LlamaIndex for the RAG framework
ChromaDB for the vector database
FastMCP for the MCP server framework

📄 License

MIT — see LICENSE for details.

paper-intelligence

Paper Intelligence

🚀 Quick Start

1. Install UV (one-time setup)

2. Add to Your MCP Client

🔌 MCP Client Integration

✨ Features

📖 MCP Tools

`process_paper`

`search`

Semantic search across papers

Exact text search

Regex search

`convert_pdf`

`get_paper_info`

`index_markdown` / `embed_document`

📊 Example Output

Search Result

🎯 Typical Workflow

🛠️ Installation Options

Unit tests (fast)

Integration tests (slow, requires ML models)

🔧 Debugging

🆘 Troubleshooting

🏗️ Technical Stack

🙏 Acknowledgments

📄 License

Related MCP servers

MCP servers by category

paper-intelligence

Paper Intelligence

🚀 Quick Start

1. Install UV (one-time setup)

2. Add to Your MCP Client

🔌 MCP Client Integration

✨ Features

📖 MCP Tools

process_paper

search

Semantic search across papers

Exact text search

Regex search

convert_pdf

get_paper_info

index_markdown / embed_document

📊 Example Output

Search Result

🎯 Typical Workflow

🛠️ Installation Options

Unit tests (fast)

Integration tests (slow, requires ML models)

🔧 Debugging

🆘 Troubleshooting

🏗️ Technical Stack

🙏 Acknowledgments

📄 License

Related MCP servers

MCP servers by category

`process_paper`

`search`

`convert_pdf`

`get_paper_info`

`index_markdown` / `embed_document`