PDF Reader

Xvvln/pdf-reader-mcp
1 starsMITCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

MCP server for extracting text, images, tables, links, annotations, and metadata from PDF files.

README.md

pdf-reader-mcp

一个用于读取和分析 PDF 文件的 MCP 服务器。它可以为支持 MCP(Model Context Protocol)的客户端提供 PDF 文本、页面图片、表格、链接、批注、目录、元数据和基础文本统计。

A PDF-focused MCP server for extracting text, rendered pages, tables, links, annotations, outlines, metadata, and text statistics from PDF files.

<!-- mcp-name: io.github.Xvvln/pdf-reader-mcp -->

Package name

  • GitHub repository: pdf-reader-mcp
  • MCP Registry name: io.github.Xvvln/pdf-reader-mcp
  • PyPI package: pdf-insight-mcp
  • CLI commands: pdf-reader-mcp and pdf-insight-mcp

pdf-reader-mcp is the project name. The PyPI package is published as pdf-insight-mcp because the pdf-reader-mcp package name is not available on PyPI.

Features

| Tool | What it does | | --- | --- | | get_pdf_info | Read document metadata, page count, file size, and encryption status. | | read_pdf_as_text | Extract text from selected pages with page and character limits. | | read_pdf_as_images | Render selected pages as base64-encoded images. | | get_pdf_outline | Read bookmarks and outline entries. | | search_pdf_text | Search text and return per-match page context. | | extract_pdf_tables | Extract structured tables when PyMuPDF can detect them. | | extract_pdf_images | Extract embedded PDF images. | | get_pdf_page_info | Inspect one page's size, text, images, links, and rotation. | | extract_pdf_links | Extract external URLs and internal page jumps. | | get_pdf_annotations | Read comments, highlights, and annotation metadata. | | get_pdf_text_stats | Compute text, line, paragraph, and scan-likelihood stats. | | compare_pdf_pages | Compare text similarity between two pages. |

Quick start

Install uv if you do not already have it:

curl -LsSf https://astral.sh/uv/install.sh | sh

Run the server directly from PyPI:

uvx pdf-insight-mcp

Or install it first:

python -m pip install pdf-insight-mcp
pdf-reader-mcp

MCP client configuration

Use the published PyPI package:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uvx",
      "args": ["pdf-insight-mcp"]
    }
  }
}

Use a local checkout for development:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/pdf-reader-mcp",
        "run",
        "pdf-reader-mcp"
      ]
    }
  }
}

Replace /absolute/path/to/pdf-reader-mcp with the absolute path to this repository on your machine.

Common usage

Ask your MCP client to call tools with an absolute PDF path. Example requests:

Read /Users/me/Documents/report.pdf as text.
Search /Users/me/Documents/report.pdf for "baseline characteristics".
Render pages 1-3 of /Users/me/Documents/report.pdf as images.
Extract links and annotations from /Users/me/Documents/review.pdf.

For large PDFs, prefer small page ranges first. For scanned or layout-sensitive PDFs, use read_pdf_as_images with a small pages range and moderate dpi.

Limits and behavior

  • read_pdf_as_text defaults to at most 50 pages and 200000 returned characters.
  • read_pdf_as_images rejects requests above 20 pages.
  • read_pdf_as_images defaults to an overall image payload cap of about 20 MB.
  • extract_pdf_images returns at most 20 embedded images but reports the actual detected total.
  • Encrypted PDFs are rejected unless they are already accessible without a password.
  • Scanned PDFs may have little or no extractable text. Use image rendering or OCR outside this server when needed.

Development

Install dependencies:

uv sync --extra dev

Run tests:

uv run pytest -q

Build the package:

uv build
uvx twine check dist/*

Run the local server:

uv run pdf-reader-mcp

Release

Releases are published through GitHub Actions.

Before the first release, configure PyPI Trusted Publishing with:

PyPI project name: pdf-insight-mcp
Owner: Xvvln
Repository name: pdf-reader-mcp
Workflow filename: publish.yml
Environment name: leave empty

Then release by bumping versions in pyproject.toml and server.json, committing the change, and pushing a version tag:

git tag vX.Y.Z
git push origin main --tags

The Publish workflow runs tests, builds the Python package, publishes to PyPI, authenticates to the MCP Registry with GitHub OIDC, and publishes server.json.

Tech stack

  • Python 3.10+
  • MCP Python SDK
  • PyMuPDF
  • uv
  • pytest

License

MIT

Related MCP servers

Browse all →