Multi-provider media generation MCP server that generates images, videos, audio, and transcriptions from text prompts using OpenAI, xAI, Gemini, ElevenLabs, and BFL through a single unified interface.
Getting started
Add multimodal-mcp to your MCP-capable client — Claude Code, Cursor, Codex, and others — by following the setup at the source, which documents the exact command, configuration, and any required API keys.






