Narrator MCP

1J6K21/MinecraftNarratorMCP
1 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

Captures screenshots, analyzes changes with Gemini, generates humorous narrations with sound effects, and plays them as speech via ElevenLabs, optionally integrating with a Minecraft mod for in-game events.

README.md

Narrator MCP

A fun MCP server that takes screenshots, describes what changed, generates hilarious narration, and plays it back as audio!

Datathon2025

We made this product in Texas A&M's DATATHON 2025 - A hackathon with the focus of AI, ML, Computer Science, Data Science, and Statistics! <br></br>

Results:

Agent: 2nd Place 🏆

MCP(this repo): Best use of ElevenLabs 🏆

<br></br>

Contributors:

Jonathan Kalsky '29 (CS):

www.linkedin.com/in/jonathan-kalsky

Aaron Yang '29 (CS):

https://www.linkedin.com/in/nianjin-yang/

Ethan Hince '29 (CS):

https://www.linkedin.com/in/ethan-hince-a831a5381/

<br></br>

Brief Demo:

https://youtu.be/umaCNd4jPfY?si=QRRDb36p3LI-7tyr <br></br>

License

We do not allow the reproducing, forking, or stealing of our idea, code, or intellectual property. For information email jonathan.kalsky@gmail.com

<br></br> <br></br>

Features

Connected to our custom made and simultaneously programmed Minecraft Mod that updates a the HTTP port with data

View: https://github.com/nianjindev/MinecraftMCPSender

There is a UI to view the data on the port. To host the UI, run:

python3 minecraft_reciever.py
python minecraft_reciever.py (windows)

Otherwise

  • Get Screenshot: Retrieves the last two screenshots from a directory
  • Describe: Uses Gemini to analyze screenshots and describe changes
  • Narrate: Generates funny, sarcastic narration about what you're doing
  • Sound Effects: Automatically adds comedic sound effects from MyInstants API
  • TTS: Converts narration to speech with ElevenLabs TTS
  • Auto-cleanup: Only keeps the last 5 screenshots

<br></br>

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file with your API keys:
cp .env.example .env
# Edit .env and add your API keys
  1. Get API keys:
  • Gemini API key: https://aistudio.google.com/app/apikey (FREE!)
  • ElevenLabs API key: https://elevenlabs.io/app/settings/api-keys (10k chars/month free)
  1. In config/mcp/servers.json:
  • You may need to switch "Python3" to "Python" in the command field

Usage

Run the Client

Minecraft Mod Integration (Preferred)

To include Minecraft gameplay events in the narration:

  1. Install the Minecraft Fabric mod (see MinecraftMCP.java)
  2. Run the screenshot client (it automatically starts the receiver):
python mincraft_client_only.py
  1. Launch Minecraft - the mod will send events automatically

The narrator will describe both what's happening on screen AND in-game events like:

  • Blocks placed/broken
  • Damage taken
  • Biome changes
  • Day/night cycle

Alternate Screenshot Context-based Client

The client takes screenshots every 5 seconds and generates narrated audio:

python screenshot_client.py

This will:

  1. Take a screenshot every 5 seconds
  2. After 2 screenshots, compare them
  3. Generate a funny narration about what changed
  4. Convert to speech and play it automatically

Note: The Minecraft receiver runs automatically in the background. No need to start it separately!

┌─────────────────┐<br></br> │ Minecraft Mod │ (Java)<br></br> │ (in game) │<br></br> └────────┬────────┘<br></br> │ HTTP POST<br></br> ▼<br></br> ┌─────────────────────┐<br></br> │ minecraft_receiver │ (Flask HTTP server)<br></br> │ Port 8080 │ Saves to minecraft_data.json<br></br> └────────┬────────────┘<br></br> │ File write<br></br> ▼<br></br> ┌─────────────────────┐<br></br> │ minecraft_data.json │ (Shared file)<br></br> └────────┬────────────┘<br></br> │ File read<br></br> ▼<br></br> ┌─────────────────────┐<br></br> │ screenshot_client │ Reads file, calls MCP tool<br></br> └────────┬────────────┘<br></br> │ MCP call<br></br> ▼<br></br> ┌─────────────────────┐<br></br> │ mcp_server.py │ Processes data via get_minecraft_input tool<br></br> └─────────────────────┘<br></br>

Use as MCP Server

You can also use this as an MCP server in Kiro or other MCP clients:

{
  "mcpServers": {
    "screenshot-narrator": {
      "command": "python",
      "args": ["mcp_server.py"],
      "env": {
        "GEMINI_API_KEY": "your_key",
        "ELEVENLABS_API_KEY": "your_key",
        "SCREENSHOT_DIR": "./screenshots"
      }
    }
  }
}

Available MCP Tools

  • get_screenshot: Get the last N screenshots
  • get_minecraft_input: Receive Minecraft gameplay events
  • describe: Analyze screenshots and/or Minecraft data
  • narrate: Generate funny narration from a description
  • describe_for_narration: Combined tool (faster) - analyze and narrate in one step
  • summarize_narrations: Combine multiple narrations into one sentence
  • get_sfx: Search for sound effects from MyInstants API
  • tts: Convert text to speech with ElevenLabs

How It Works

  1. Screenshots: Uses macOS screencapture to grab screenshots
  2. Analysis: Gemini 2.5 Flash analyzes images and describes changes (fast & free!)
  3. Narration + SFX Keyword: Gemini generates sarcastic commentary AND extracts the best sound effect keyword in ONE API call (super fast!)
  4. Sound Effect Search: Searches MyInstants API with the AI-selected keyword and randomly picks from top results
  5. Speech: ElevenLabs TTS converts text to audio with high-quality voice
  6. Playback: Plays sound effect and narration audio in parallel for perfect timing

Example Output

📸 Screenshot saved: screenshots/screenshot_20251108_143022.png
📸 Screenshot saved: screenshots/screenshot_20251108_143027.png

==================================================
🎬 Processing screenshots...
==================================================
📋 Getting screenshots...
🔍 Describing changes...
Description: The user has switched from their code editor to a web browser,
apparently giving up on debugging to search Stack Overflow instead.

🎭 Generating funny narration...
Narration: And here we observe the developer in their natural habitat,
abandoning all hope of solving the problem themselves and turning to the
ancient wisdom of strangers on the internet. Truly magnificent.

🎤 Converting to speech...
🔊 Playing audio: screenshots/narration_20251108_143030.mp3

Documentation

Notes

  • Cross-platform: Works on macOS, Windows, and Linux
  • Costs: Gemini is FREE + ElevenLabs has 10k chars/month free tier
  • Gemini 2.5 Flash is super fast for vision tasks
  • ElevenLabs voices are incredibly realistic and expressive
  • MyInstants API provides free sound effects
  • Press Ctrl+C to stop the client

Platform-specific details:

  • macOS: Uses native screencapture and afplay commands
  • Windows: Uses PIL for screenshots and pygame for audio
  • Linux: Uses scrot/gnome-screenshot for screenshots, various audio players

License & Attribution

This project uses sound effects from:

  • MyInstants API by abdiputranar: https://github.com/abdipr/myinstants-api
  • MyInstants.com: https://www.myinstants.com

Sound effects are obtained via web scraping from MyInstants.com. This project:

  • Provides proper attribution to the MyInstants API and MyInstants.com
  • Is used for non-commercial, educational, and entertainment purposes only
  • Complies with the MyInstants API usage requirements
  • Does not abuse the API for personal commercial benefits

If you use this project, please maintain this attribution and follow the same guidelines.

Customization

  • Change INTERVAL in screenshot_client.py to adjust screenshot frequency
  • Modify the narration prompt in mcp_server.py for different comedy styles
  • Change TTS voice in mcp_server.py (ElevenLabs voices: Adam, Antoni, Arnold, Bella, Domi, Elli, Josh, Rachel, Sam, and more)
  • Customize SFX selection logic in the get_sfx_for_narration() function to match different keywords

Sound Effects

The system uses AI to automatically select the perfect sound effect for each narration:

  1. AI Keyword Extraction: Gemini analyzes the narration and extracts the single best keyword for a sound effect (e.g., "crash", "laugh", "explosion", "oof", "bruh", "scream", etc.)
  2. MyInstants Search: Searches the MyInstants API with that keyword
  3. Random Selection: Picks a random sound from the top 10 results for variety
  4. Parallel Playback: Plays sound effect and narration simultaneously for perfect comedic timing

This gives unlimited variety - every narration gets a unique, contextually appropriate sound effect!

Example keywords extracted by AI: crash, laugh, explosion, scream, bell, drum, bruh, oof, yikes, gasp, applause, horn, punch, falling, and many more!

Credits

Sound effects are provided by:

  • MyInstants API: https://github.com/abdipr/myinstants-api (by abdiputranar)
  • MyInstants.com: https://www.myinstants.com (original sound library)

Sounds are obtained via web scraping from MyInstants.com. This project complies with the API's usage requirements by providing proper attribution and is used for non-commercial, educational purposes only.

Related MCP servers

Browse all →