Screen Capture MCP

Sherif-Aboulnasr/Agent-Screen-Capture-MCP
0 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

Enables AI agents to capture screen regions by defining a visual overlay or arbitrary coordinates, returning base64 PNG images.

README.md

Screen Capture MCP

An MCP (Model Context Protocol) server that gives AI agents eyes on your screen. Define a capture region with a visual overlay, then let your agent screenshot it on demand.

Built for Claude Code but works with any MCP-compatible client.

---

How It Works

Two components work together:

  1. Overlay (overlay.py) — A transparent, draggable frame you position over any window. It saves the coordinates to a temp file.
  2. MCP Server (server.js) — Reads those coordinates and uses macOS screencapture to grab the region, returning a base64 PNG to the agent.
┌──────────────────────────┐
│  Your App Window         │
│  ┌────────────────────┐  │
│  │ ▓▓ Overlay Frame ▓▓│  │      overlay.py
│  │ ▓                 ▓│  │   saves coordinates
│  │ ▓  (capture area) ▓│──────▶ /tmp/claude-viewport.json
│  │ ▓                 ▓│  │
│  │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│  │
│  └────────────────────┘  │
│                          │      server.js
└──────────────────────────┘   reads coords + screencapture
                                      │
                                      ▼
                               base64 PNG → Agent

---

Tools

The MCP server exposes three tools:

| Tool | Description | |---|---| | capture_viewport | Capture the locked overlay region. Returns the image + metadata. | | get_viewport_info | Read the current overlay coordinates without taking a screenshot. | | capture_region | Capture any arbitrary screen region by x, y, width, height. No overlay needed. |

---

Setup

Prerequisites

  • macOS (uses native screencapture)
  • Node.js 18+
  • Python 3.8+ with Tkinter

Install

git clone https://github.com/Sherif-Aboulnasr/Screen-Capture-MCP.git
cd Screen-Capture-MCP
npm install

Configure your MCP client

Add to your MCP settings (e.g. ~/.claude/settings.json for Claude Code):

{
  "mcpServers": {
    "screen-capture": {
      "command": "node",
      "args": ["/path/to/Screen-Capture-MCP/server.js"]
    }
  }
}

---

Usage

1. Start the overlay

python3 overlay.py

A lime-green border frame appears on screen.

2. Position it

  • Drag the top bar to move the frame over your target window
  • Drag the bottom-right handle to resize
  • Click the lock icon to lock the region in place (border turns red)

3. Let your agent capture

Once locked, the agent can call capture_viewport at any time to see what's on screen.

Agent: "Let me check how that looks..."
→ calls capture_viewport
→ receives screenshot of the locked region
→ "The button is misaligned — let me fix the padding..."

Capture without the overlay

Use capture_region to grab any screen coordinates directly:

{
  "x": 100,
  "y": 200,
  "width": 800,
  "height": 600
}

---

Overlay controls

| Control | Action | |---|---| | Top bar drag | Move the frame | | Bottom-right handle drag | Resize | | Lock icon (top-left) | Toggle lock/unlock | | X button (top-right) | Quit overlay |

| State | Border color | |---|---| | Unlocked (moveable) | Lime green | | Locked (capture ready) | Red |

---

How it's built

  • Overlay: Python Tkinter — four separate edge windows with a 3px border, keeping the center click-through. A separate resize handle window at the bottom-right corner.
  • Server: Node.js using @modelcontextprotocol/sdk. Reads coordinates from /tmp/claude-viewport.json, runs screencapture -R, returns base64 PNG.
  • Coordinate file: JSON at /tmp/claude-viewport.json with adjusted capture coordinates and raw frame coordinates.

---

Use cases

  • UI development — Agent captures before/after screenshots while iterating on styles
  • Visual debugging — "Does this layout look right?" — agent can see for itself
  • Design review — Position overlay on a Figma/design window, let the agent compare against code output
  • Mobile development — Frame your iOS Simulator, let the agent see the app state

---

License

MIT

Related MCP servers

Browse all →