MCP Camoufox Scraper Server
A proof-of-concept MCP (Model Context Protocol) server that uses Camoufox for web scraping with JavaScript disabled and network request monitoring.
๐ Quick Start
Prerequisites
- Python 3.10 or higher
- Poetry (Python dependency manager)
- macOS, Linux, or Windows
1. Clone/Download the Project
git clone <your-repo> mcp-camoufox-scraper
cd mcp-camoufox-scraper
2. Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -
3. Install Dependencies
poetry install
4. Verify Setup
poetry run python setup_verify.py
You should see all checks pass: `` ๐ Setup verification successful! ``
5. Run Full Test
poetry run python test_mcp_server.py
You should see: `` ๐ MCP Server POC is ready! The server can now be used with MCP clients to scrape websites with JS disabled. ``
๐ Connecting to MCP Clients
Claude Desktop Integration
- Find your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
- Add the server configuration:
{
"mcpServers": {
"camoufox-scraper": {
"command": "poetry",
"args": ["run", "python", "/full/path/to/your/mcp-camoufox-scraper/run_server.py"],
"cwd": "/full/path/to/your/mcp-camoufox-scraper"
}
}
}
- Replace the path with your actual project directory:
# Get your full path
pwd
# Copy the output and use it in the config above
- Restart Claude Desktop completely (quit and reopen)
- Verify connection by asking Claude:
"What MCP servers do you have access to?"
You should see "camoufox-scraper" listed with the available tools.
Other MCP Clients
For other MCP clients, use this server configuration:
- Command:
poetry - Arguments:
["run", "python", "/path/to/run_server.py"] - Working Directory:
/path/to/mcp-camoufox-scraper - Communication: stdio
๐ ๏ธ Features
- Dual JavaScript mode: JavaScript enabled for network monitoring, disabled for clean HTML extraction
- Network request monitoring: Capture all XHR/API calls and HTTP requests made during page load
- Clean HTML extraction: Get HTML content with JavaScript disabled to avoid dynamic modifications
- MCP protocol integration: Works with any MCP client (Claude Desktop, etc.)
๐ Available Tools
1. navigate_to_url
Navigate to a URL with JavaScript enabled to capture network requests and dynamic content.
Parameters:
url(required): The URL to navigate towait_time(optional): Time to wait after page load in seconds (default: 3)
Claude Example: > "Please navigate to https://example.com and wait 5 seconds"
2. get_page_html
Extract clean HTML content by re-loading the page with JavaScript disabled.
Parameters: None
Claude Example: > "Get the HTML content from the current page"
3. get_network_requests
Get all captured network requests from the last page navigation.
Parameters:
filter_type(optional): Filter by request type ("xhr", "fetch", "all") - default: "all"
Claude Example: > "Show me all the network requests that were captured"
4. close_browser
Close the browser and cleanup resources.
Parameters: None
Claude Example: > "Close the browser to free up resources"
๐ก Usage Examples
Basic Web Scraping
You: Navigate to https://news.ycombinator.com
Claude: [Uses navigate_to_url tool]
You: Get the HTML content
Claude: [Uses get_page_html tool and analyzes the content]
You: What network requests were made?
Claude: [Uses get_network_requests tool and shows API calls]
API Discovery
You: Go to https://httpbin.org/headers and show me what requests it makes
Claude: [Navigates and shows network monitoring results]
Content Analysis
You: Navigate to https://example.com and get both the network requests and clean HTML
Claude: [Navigates with JS enabled to capture requests, then extracts HTML with JS disabled]
You: What's the difference between the two modes?
Claude: [Explains that navigation captures dynamic requests while HTML extraction gives clean content]
๐งช Testing & Verification
Run Full Test Suite
poetry run python test_mcp_server.py
Expected output: `` === Testing MCP Server Tools === โ Navigation successful: Example Domain โ HTML extraction successful โ Network requests retrieval successful โ Complex site navigation successful โ Browser closed successfully ๐ MCP Server POC is ready! ``
Test Individual Components
# Test just the Camoufox API
poetry run python test_camoufox_api.py
# Start server manually (for debugging)
poetry run python run_server.py
๐ง Troubleshooting
Quick Diagnosis
# Run the setup verification script first
poetry run python setup_verify.py
This will check your Python version, dependencies, project structure, and generate the correct Claude Desktop configuration.
Server Won't Start
# Check if dependencies are installed
poetry show mcp camoufox
# Run the test to identify issues
poetry run python test_mcp_server.py
Claude Desktop Connection Issues
- Check config file location - Make sure you're editing the right file
- Use absolute paths - Relative paths won't work
- Restart Claude Desktop completely after config changes
- Check Claude's developer tools for error messages
- Verify Python path - Make sure
pythoncommand works in terminal
Browser Issues
- Camoufox download: First run may take time downloading browser binaries
- Permission errors: Make sure the script has execute permissions (
chmod +x run_server.py) - Port conflicts: Close other browser automation tools if running
Common Error Messages
"No such file or directory"โ Check the path in your config"Permission denied"โ Runchmod +x run_server.py"Module not found"โ Runpoetry install
๐ Project Structure
mcp-camoufox-scraper/
โโโ mcp_camoufox_scraper/
โ โโโ __init__.py
โ โโโ server.py # Main MCP server implementation
โโโ run_server.py # CLI runner script
โโโ setup_verify.py # Setup verification & config generator
โโโ test_mcp_server.py # Comprehensive test suite
โโโ test_camoufox_api.py # Camoufox API validation
โโโ pyproject.toml # Project configuration & dependencies
โโโ poetry.lock # Locked dependency versions
โโโ README.md # This file
โ ๏ธ Limitations
- Dual-context approach: HTML extraction requires re-navigation with JS disabled
- Single page: Only one page can be active at a time
- No authentication: Currently no support for login/auth workflows
- Limited to HTTP/HTTPS: No support for other protocols
๐ Security & Privacy
- No data collection: Server runs locally, no data sent to external services
- Controlled JavaScript execution: JS enabled only when needed for network monitoring
- Privacy-focused browser: Camoufox is designed for privacy
- Local execution: All scraping happens on your machine
๐ ๏ธ Development & Extension
Modifying the Server
- Edit
mcp_camoufox_scraper/server.py - Test changes:
poetry run python test_mcp_server.py - Verify MCP compliance with your client
Adding New Tools
# In server.py, add to _register_tools():
@self.server.call_tool()
async def handle_call_tool(name: str, arguments: Dict[str, Any]):
if name == "your_new_tool":
return await self._your_new_tool(arguments)
Technical Details
- Browser: Camoufox (privacy-focused Firefox-based)
- Protocol: MCP (Model Context Protocol)
- Language: Python 3.10+ with asyncio
- Dependencies: MCP SDK, Camoufox browser automation
๐ License
MIT License - Feel free to use, modify, and distribute.
๐ค Contributing
This is a proof-of-concept project. Feel free to:
- Fork and extend for your use cases
- Submit issues and improvements
- Share your modifications with the community
---
Happy scraping! ๐ท๏ธ






