PDF MCP

MCP server for PDF processing and analysis using PyPDFium2.

Features

extract_text: Extract text content from PDF files with page range support
extract_metadata: Extract PDF metadata including title, author, and page count
search_text: Search for specific text within PDF files with context
get_page_count: Get the total number of pages in a PDF file
extract_pages: Extract specific pages from a PDF and save as a new PDF
split_pdf: Split a PDF into multiple page-based PDFs with base64 encoding
merge_pdfs: Merge multiple PDF files into a single PDF
pdf_to_images: Convert PDF pages to PNG images with configurable DPI
get_form_fields: Extract all form fields from a PDF including names, types, and values
fill_form: Fill form fields in a PDF with provided values and save to output path

Installation

From Git Repository

# Clone the repository
git clone https://github.com/gzigurella/pdf-mcp.git
cd pdf-mcp

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package
pip install -e .

With uv (recommended)

# Clone and enter directory
git clone https://github.com/gzigurella/pdf-mcp.git
cd pdf-mcp

# Install with uv
uv pip install -e .

Integration

OpenCode

Add to your ~/.config/opencode/opencode.json:

{
  "mcpServers": {
    "pdf-mcp": {
      "type": "local",
      "command": [
        "/path/to/pdf-mcp/venv/bin/python",
        "-m",
        "pdf_mcp"
      ],
      "enabled": true
    }
  }
}

Claude Desktop

Add to your Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "pdf-mcp": {
      "command": "/path/to/pdf-mcp/venv/bin/python",
      "args": ["-m", "pdf_mcp"]
    }
  }
}

Generic MCP Client

For any MCP-compatible client:

# Start the server directly
/path/to/venv/bin/python -m pdf_mcp

The server communicates via stdio using the MCP protocol.

Tools

extract_text

Extract text content from a PDF file. Supports PDFs with searchable text and can extract text from specific pages or ranges.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the PDF file to extract text from | | pages | string | No | "all" | Page range to extract (e.g., '1-5', '3,7,9', 'all') |

{
  "file_path": "/path/to/document.pdf",
  "pages": "1-5"
}

extract_metadata

Extract metadata from a PDF file including title, author, subject, keywords, creator, producer, creation date, modification date, and page count.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the PDF file to extract metadata from |

{
  "file_path": "/path/to/document.pdf"
}

search_text

Search for specific text within a PDF file. Returns page numbers and context around the found text. Useful for finding specific content in large documents.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the PDF file to search within | | query | string | Yes | - | Text to search for in the PDF | | case_sensitive | boolean | No | false | Whether to perform case-sensitive search | | context_words | integer | No | 10 | Number of words to include before and after each match |

{
  "file_path": "/path/to/document.pdf",
  "query": "important term",
  "case_sensitive": false,
  "context_words": 5
}

get_page_count

Get the total number of pages in a PDF file. Returns a simple integer count.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the PDF file to count pages for |

{
  "file_path": "/path/to/document.pdf"
}

extract_pages

Extract specific pages from a PDF file and save as a new PDF. Supports page ranges and individual page selection.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the source PDF file | | pages | string | Yes | - | Pages to extract (e.g., '1-5', '3,7,9', '1,3-5') | | output_path | string | Yes | - | Path where the extracted pages will be saved as a new PDF |

{
  "file_path": "/path/to/source.pdf",
  "pages": "1,3,5-7",
  "output_path": "/path/to/output.pdf"
}

split_pdf

Split a PDF file into multiple separate PDF files based on page ranges. Returns a JSON with base64-encoded PDFs for each selected page. Supports single pages, page ranges, and all pages.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the PDF file to split | | page_range | string | Yes | - | Page range to split - 'all', single page (e.g., '1'), or range (e.g., '1-3', '2-5') |

{
  "file_path": "/path/to/document.pdf",
  "page_range": "1-3"
}

merge_pdfs

Merge multiple PDF files into a single PDF. Files are merged in the order provided.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_paths | array | Yes | - | List of PDF file paths to merge | | output_path | string | Yes | - | Path where the merged PDF will be saved |

{
  "file_paths": ["/path/to/doc1.pdf", "/path/to/doc2.pdf", "/path/to/doc3.pdf"],
  "output_path": "/path/to/merged.pdf"
}

pdf_to_images

Convert PDF pages to PNG images. Returns a JSON with base64-encoded PNG images for each page. Supports custom DPI settings for resolution control.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the PDF file to convert to images | | dpi | integer | No | 150 | Image resolution in dots per inch | | format | string | No | "png" | Image format (PNG only) |

{
  "file_path": "/path/to/document.pdf",
  "dpi": 300,
  "format": "png"
}

get_form_fields

Extract all form fields from a PDF document including field names, types, current values, and available choices for dropdown fields.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the PDF file to extract form fields from |

{
  "file_path": "/path/to/form.pdf"
}

Returns a JSON with field information: ``json { "fields": [ { "name": "first_name", "type": "text", "value": "", "page": 1, "rect": {"x0": 50, "y0": 72, "x1": 150, "y1": 92} }, { "name": "country", "type": "combobox", "value": "", "page": 1, "rect": {...}, "choices": ["USA", "Canada", "UK"] }, { "name": "accept_terms", "type": "checkbox", "value": "", "page": 1, "rect": {...}, "on_state": "Yes" } ], "total_fields": 3 } ``

fill_form

Fill form fields in a PDF document with provided values and save to output path. Supports text fields, checkboxes, radio buttons, and dropdowns.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | file_path | string | Yes | - | Path to the source PDF file | | fields | object | Yes | - | Dictionary of field names and their values to fill | | output_path | string | Yes | - | Path where the filled PDF will be saved |

{
  "file_path": "/path/to/form.pdf",
  "fields": {
    "first_name": "John",
    "last_name": "Doe",
    "country": "USA",
    "accept_terms": true
  },
  "output_path": "/path/to/filled_form.pdf"
}

Checkbox values accept: true/false, "yes"/"no", "1"/"0". Radio buttons: use the value from on_state field (get with get_form_fields first).

Configuration

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | PDF_MCP_DEBUG | false | Enable debug logging |

# Example
export PDF_MCP_DEBUG=true
python -m pdf_mcp

Development

Running Tests

source venv/bin/activate
pytest

# With coverage
pytest --cov=src --cov-report=html

Project Structure

pdf-mcp/
├── src/pdf_mcp/
│   ├── __init__.py
│   ├── __main__.py
│   ├── server.py
│   ├── config.py
│   └── tools/
│       ├── __init__.py
│       ├── extract_text.py
│       ├── extract_metadata.py
│       ├── search_text.py
│       ├── get_page_count.py
│       ├── extract_pages.py
│       ├── split_pdf.py
│       ├── merge_pdfs.py
│       ├── pdf_to_images.py
│       ├── get_form_fields.py
│       └── fill_form.py
├── tests/
├── pyproject.toml
└── README.md

Troubleshooting

Installation Issues

If you encounter installation errors, ensure you have Python 3.10 or later:

python --version

File Not Found Errors

Make sure the PDF file paths are correct and the files exist:

ls -l /path/to/your/document.pdf

Encrypted PDFs

The tools will raise a RuntimeError if attempting to process encrypted PDFs. Ensure your PDFs are not password-protected.

Memory Issues with Large PDFs

For very large PDF files, consider processing them in smaller chunks using the extract_pages or split_pdf tools.

Permission Errors (Linux)

If you encounter permission errors, ensure the PDF files are readable:

chmod +r /path/to/your/document.pdf

Security Considerations

File Access: The server only processes files that exist and are readable by the running process
Path Validation: All file paths are validated before processing
No Network Access: The server does not make any network requests
Temporary Files: Temporary files are properly cleaned up after processing
Error Handling: Sensitive information is not exposed in error messages
Encrypted PDFs: Password-protected PDFs are rejected with appropriate error messages

Example Usage Scenarios

Scenario 1: Extract Text from Specific Pages

{
  "name": "extract_text",
  "arguments": {
    "file_path": "/documents/report.pdf",
    "pages": "1-3,7,9"
  }
}

Scenario 2: Search and Extract Context

{
  "name": "search_text",
  "arguments": {
    "file_path": "/documents/contract.pdf",
    "query": "liability clause",
    "case_sensitive": true,
    "context_words": 15
  }
}

Scenario 3: Merge Multiple Reports

{
  "name": "merge_pdfs",
  "arguments": {
    "file_paths": [
      "/reports/q1.pdf",
      "/reports/q2.pdf", 
      "/reports/q3.pdf",
      "/reports/q4.pdf"
    ],
    "output_path": "/reports/annual.pdf"
  }
}

Scenario 4: Convert PDF to Images

{
  "name": "pdf_to_images",
  "arguments": {
    "file_path": "/documents/presentation.pdf",
    "dpi": 300
  }
}

License

MIT

PDF MCP

PDF MCP

Features

Installation

From Git Repository

With uv (recommended)

Integration

OpenCode

Claude Desktop

Generic MCP Client

Tools

extract_text

extract_metadata

search_text

get_page_count

extract_pages

split_pdf

merge_pdfs

pdf_to_images

get_form_fields

fill_form

Configuration

Environment Variables

Development

Running Tests

Project Structure

Troubleshooting

Installation Issues

File Not Found Errors

Encrypted PDFs

Memory Issues with Large PDFs

Permission Errors (Linux)

Security Considerations

Example Usage Scenarios

Scenario 1: Extract Text from Specific Pages

Scenario 2: Search and Extract Context

Scenario 3: Merge Multiple Reports

Scenario 4: Convert PDF to Images

License

Related MCP servers

MCP servers by category