Gives text-only coding agents the ability to 'see' images, videos, and screenshots by routing them to a vision model and returning structured text.
Getting started
Add vision-mcp to your MCP-capable client — Claude Code, Cursor, Codex, and others — by following the setup at the source, which documents the exact command, configuration, and any required API keys.






