pdf-reader-mcp
一个用于读取和分析 PDF 文件的 MCP 服务器。它可以为支持 MCP(Model Context Protocol)的客户端提供 PDF 文本、页面图片、表格、链接、批注、目录、元数据和基础文本统计。
A PDF-focused MCP server for extracting text, rendered pages, tables, links, annotations, outlines, metadata, and text statistics from PDF files.
<!-- mcp-name: io.github.Xvvln/pdf-reader-mcp -->
Package name
- GitHub repository:
pdf-reader-mcp - MCP Registry name:
io.github.Xvvln/pdf-reader-mcp - PyPI package:
pdf-insight-mcp - CLI commands:
pdf-reader-mcpandpdf-insight-mcp
pdf-reader-mcp is the project name. The PyPI package is published as pdf-insight-mcp because the pdf-reader-mcp package name is not available on PyPI.
Features
| Tool | What it does | | --- | --- | | get_pdf_info | Read document metadata, page count, file size, and encryption status. | | read_pdf_as_text | Extract text from selected pages with page and character limits. | | read_pdf_as_images | Render selected pages as base64-encoded images. | | get_pdf_outline | Read bookmarks and outline entries. | | search_pdf_text | Search text and return per-match page context. | | extract_pdf_tables | Extract structured tables when PyMuPDF can detect them. | | extract_pdf_images | Extract embedded PDF images. | | get_pdf_page_info | Inspect one page's size, text, images, links, and rotation. | | extract_pdf_links | Extract external URLs and internal page jumps. | | get_pdf_annotations | Read comments, highlights, and annotation metadata. | | get_pdf_text_stats | Compute text, line, paragraph, and scan-likelihood stats. | | compare_pdf_pages | Compare text similarity between two pages. |
Quick start
Install uv if you do not already have it:
curl -LsSf https://astral.sh/uv/install.sh | sh
Run the server directly from PyPI:
uvx pdf-insight-mcp
Or install it first:
python -m pip install pdf-insight-mcp
pdf-reader-mcp
MCP client configuration
Use the published PyPI package:
{
"mcpServers": {
"pdf-reader": {
"command": "uvx",
"args": ["pdf-insight-mcp"]
}
}
}
Use a local checkout for development:
{
"mcpServers": {
"pdf-reader": {
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/pdf-reader-mcp",
"run",
"pdf-reader-mcp"
]
}
}
}
Replace /absolute/path/to/pdf-reader-mcp with the absolute path to this repository on your machine.
Common usage
Ask your MCP client to call tools with an absolute PDF path. Example requests:
Read /Users/me/Documents/report.pdf as text.
Search /Users/me/Documents/report.pdf for "baseline characteristics".
Render pages 1-3 of /Users/me/Documents/report.pdf as images.
Extract links and annotations from /Users/me/Documents/review.pdf.
For large PDFs, prefer small page ranges first. For scanned or layout-sensitive PDFs, use read_pdf_as_images with a small pages range and moderate dpi.
Limits and behavior
read_pdf_as_textdefaults to at most 50 pages and 200000 returned characters.read_pdf_as_imagesrejects requests above 20 pages.read_pdf_as_imagesdefaults to an overall image payload cap of about 20 MB.extract_pdf_imagesreturns at most 20 embedded images but reports the actual detected total.- Encrypted PDFs are rejected unless they are already accessible without a password.
- Scanned PDFs may have little or no extractable text. Use image rendering or OCR outside this server when needed.
Development
Install dependencies:
uv sync --extra dev
Run tests:
uv run pytest -q
Build the package:
uv build
uvx twine check dist/*
Run the local server:
uv run pdf-reader-mcp
Release
Releases are published through GitHub Actions.
Before the first release, configure PyPI Trusted Publishing with:
PyPI project name: pdf-insight-mcp
Owner: Xvvln
Repository name: pdf-reader-mcp
Workflow filename: publish.yml
Environment name: leave empty
Then release by bumping versions in pyproject.toml and server.json, committing the change, and pushing a version tag:
git tag vX.Y.Z
git push origin main --tags
The Publish workflow runs tests, builds the Python package, publishes to PyPI, authenticates to the MCP Registry with GitHub OIDC, and publishes server.json.
Tech stack
- Python 3.10+
- MCP Python SDK
- PyMuPDF
- uv
- pytest
License
MIT






