Screen Capture MCP
An MCP (Model Context Protocol) server that gives AI agents eyes on your screen. Define a capture region with a visual overlay, then let your agent screenshot it on demand.
Built for Claude Code but works with any MCP-compatible client.
---
How It Works
Two components work together:
- Overlay (
overlay.py) — A transparent, draggable frame you position over any window. It saves the coordinates to a temp file. - MCP Server (
server.js) — Reads those coordinates and uses macOSscreencaptureto grab the region, returning a base64 PNG to the agent.
┌──────────────────────────┐
│ Your App Window │
│ ┌────────────────────┐ │
│ │ ▓▓ Overlay Frame ▓▓│ │ overlay.py
│ │ ▓ ▓│ │ saves coordinates
│ │ ▓ (capture area) ▓│──────▶ /tmp/claude-viewport.json
│ │ ▓ ▓│ │
│ │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│ │
│ └────────────────────┘ │
│ │ server.js
└──────────────────────────┘ reads coords + screencapture
│
▼
base64 PNG → Agent
---
Tools
The MCP server exposes three tools:
| Tool | Description | |---|---| | capture_viewport | Capture the locked overlay region. Returns the image + metadata. | | get_viewport_info | Read the current overlay coordinates without taking a screenshot. | | capture_region | Capture any arbitrary screen region by x, y, width, height. No overlay needed. |
---
Setup
Prerequisites
- macOS (uses native
screencapture) - Node.js 18+
- Python 3.8+ with Tkinter
Install
git clone https://github.com/Sherif-Aboulnasr/Screen-Capture-MCP.git
cd Screen-Capture-MCP
npm install
Configure your MCP client
Add to your MCP settings (e.g. ~/.claude/settings.json for Claude Code):
{
"mcpServers": {
"screen-capture": {
"command": "node",
"args": ["/path/to/Screen-Capture-MCP/server.js"]
}
}
}
---
Usage
1. Start the overlay
python3 overlay.py
A lime-green border frame appears on screen.
2. Position it
- Drag the top bar to move the frame over your target window
- Drag the bottom-right handle to resize
- Click the lock icon to lock the region in place (border turns red)
3. Let your agent capture
Once locked, the agent can call capture_viewport at any time to see what's on screen.
Agent: "Let me check how that looks..."
→ calls capture_viewport
→ receives screenshot of the locked region
→ "The button is misaligned — let me fix the padding..."
Capture without the overlay
Use capture_region to grab any screen coordinates directly:
{
"x": 100,
"y": 200,
"width": 800,
"height": 600
}
---
Overlay controls
| Control | Action | |---|---| | Top bar drag | Move the frame | | Bottom-right handle drag | Resize | | Lock icon (top-left) | Toggle lock/unlock | | X button (top-right) | Quit overlay |
| State | Border color | |---|---| | Unlocked (moveable) | Lime green | | Locked (capture ready) | Red |
---
How it's built
- Overlay: Python Tkinter — four separate edge windows with a 3px border, keeping the center click-through. A separate resize handle window at the bottom-right corner.
- Server: Node.js using
@modelcontextprotocol/sdk. Reads coordinates from/tmp/claude-viewport.json, runsscreencapture -R, returns base64 PNG. - Coordinate file: JSON at
/tmp/claude-viewport.jsonwith adjusted capture coordinates and raw frame coordinates.
---
Use cases
- UI development — Agent captures before/after screenshots while iterating on styles
- Visual debugging — "Does this layout look right?" — agent can see for itself
- Design review — Position overlay on a Figma/design window, let the agent compare against code output
- Mobile development — Frame your iOS Simulator, let the agent see the app state
---
License
MIT






