MCPSearch
AI-powered multi-source research and crawling platform with MCP integration
  
Overview
MCPSearch is a self-hosted research stack for agents and developers. It combines:
- parallel web search across multiple engines
- HTTP + browser + stealth crawling
- social and developer-source collection
- structured content extraction
- MCP-native tool exposure
- higher-level research workflows via
investigate,compare, andtrending
The project has grown beyond a simple crawler. The current shape is:
- 29 MCP tools in
mcp_server/server.py - a unified
mcpsearch/mcpsearch_multiinterface - shared action routing in
mcp_server/handlers.py - a flagship orchestration layer in
agents/research_agent.py
Current Capabilities
- Web search: DuckDuckGo, Google, and Bing aggregation
- Crawling modes:
fast via HTTP only, hybrid via HTTP + Playwright, stealth via anti-bot fallback
- Extraction:
markdown/text extraction, tables, code blocks, images, metadata, JSON-LD/OpenGraph/Microdata via extruct
- Fast parsing:
selectolax on hot search parsing paths with BeautifulSoup fallback
- Social sources:
Reddit, Twitter/X, YouTube, GitHub
- HTTP caching:
shared async client factory with optional Hishel-backed caching on request-heavy paths
- Research workflows:
research_agent, investigate, compare, trending
- Tool discovery:
list_tools, describe_tools, get_crawl_stats
Install
Basic install
git clone https://github.com/JonusNattapong/MCPSearch.git
cd MCPSearch
pip install -e .
playwright install chromium
Development install
make dev
or:
pip install -e ".[dev]"
playwright install chromium
Optional stealth dependency
crawler/stealth.py can use Camoufox when it is installed. If Camoufox is not available, MCPSearch falls back to Playwright-based stealth behavior.
Environment variables
OPENAI_API_KEY
Optional. Used by summarization flows when AI summaries are enabled.
Quick Start
CLI
# Search
mcpsearch search -q "AI agents"
# Crawl a page
mcpsearch crawl -u "https://example.com"
# Read a page in terminal-friendly format
mcpsearch read -u "https://example.com"
# Research workflow
mcpsearch research --query "browser fingerprinting" --depth deep --summarize
# Compare topics
mcpsearch compare --compare "React" "Vue" "Svelte" --depth medium
# Trending view
mcpsearch trending --max-results 10
# Run MCP server
mcpsearch server
Python / MCP-facing examples
# Unified tool
mcpsearch(action="search", query="LLM agents", limit=5)
mcpsearch(action="crawl", url="https://example.com", mode="hybrid")
mcpsearch(action="reddit", query="python", subreddit="learnpython")
mcpsearch(action="github", query="browser automation", sort="stars")
# Multi-action orchestration
mcpsearch_multi(actions='[
{"action":"search","query":"agent memory patterns"},
{"action":"reddit","query":"LocalLLaMA"},
{"action":"github","query":"llm agents","sort":"stars"}
]')
# Flagship research tools
investigate(topic="Python async scraping", depth="deep", include_social=True)
compare(topics="React,Vue,Svelte", depth="medium", max_sources=3)
trending(platforms="reddit,github", limit=10)
MCP Integration
Claude Desktop
{
"mcpServers": {
"mcpsearch": {
"command": "python",
"args": ["-m", "mcp_server"],
"cwd": "/path/to/MCPSearch",
"env": {
"OPENAI_API_KEY": ""
}
}
}
}
Cursor
{
"mcpServers": {
"mcpsearch": {
"command": "python",
"args": ["-m", "mcp_server"],
"cwd": "/path/to/MCPSearch"
}
}
}
Custom MCP client
{
"command": "python",
"args": ["-m", "mcp_server"],
"transport": "stdio"
}
Tool Map
Unified tools
mcpsearchmcpsearch_multi
Search and crawl tools
web_searchsearch_and_summarizesmart_searchdeep_searchcrawl_urlhybrid_crawlcrawl_recursiveextract_contentget_crawl_stats
Social tools
search_redditget_subredditget_reddit_postsearch_twitterget_user_tweetssearch_youtubeget_youtube_channelget_youtube_contentsearch_githubget_github_userget_github_repoget_github_readme
Research tools
research_agentinvestigatecomparetrending
Discovery tools
list_toolsdescribe_tools
Recommended Entry Points
If you are integrating MCPSearch into an agent:
- start with
list_toolsanddescribe_tools - prefer
mcpsearchfor simple routing - use
mcpsearch_multiwhen you want parallel source gathering - use
investigatefor richer topic-oriented research - use
comparewhen the output should be side-by-side - use
trendingfor source discovery and early signal collection
Research Workflows
investigate
Best when you want one topic explored across search, crawl, and social sources.
investigate(
topic="anti-bot browser strategies",
depth="deep",
include_social=True,
include_summary=True,
max_sources=5,
)
compare
Best when you want repeated shallow or medium investigations and a compact comparison result.
compare(
topics="Playwright,Selenium,Camoufox",
depth="medium",
max_sources=3,
)
trending
Best when you want new leads before deeper crawling.
trending(
platforms="reddit,github",
limit=10,
)
Architecture
Request flow
Query / URL / Topic
|
v
mcpsearch / direct tool
|
v
mcp_server/handlers.py
|
+--> search/aggregator.py
+--> crawler/engine.py
+--> crawler/hybrid.py
+--> crawler/stealth.py
+--> social/*.py
+--> agents/research_agent.py
Crawl strategy
fast -> HTTP only
hybrid -> HTTP first, then browser rendering when needed
stealth -> multi-browser / anti-bot fallback path
Current project structure
MCPSearch/
├── agents/ # Higher-level research orchestration
├── crawler/ # HTTP, hybrid, stealth, extraction logic
├── mcp_server/ # MCP server, unified tools, shared handlers
├── search/ # Search aggregation
├── social/ # Reddit, Twitter/X, YouTube, GitHub scrapers
├── summarizer/ # AI summarization helpers
├── tests/ # Workflow and unit tests
├── utils/ # Cache, dedup, rate limiting
├── cli.py # CLI entry point
├── Makefile # Dev/test/release commands
└── pyproject.toml # Package metadata and dependencies
Development
Useful commands
make install
make dev
make test
make test-cov
make lint
make lint-fix
make format
make server
python3 scripts/benchmark_search_and_crawl.py
Focused test commands
make test-hybrid
make test-rate-limiter
pytest tests/test_extractor.py -v
pytest tests/test_search_parsers.py -v
pytest tests/test_mcp_integration.py -v
pytest tests/test_mcp_tools.py -v
Release
make patch
make minor
make major
Version is sourced from mcpspider/version.py.
Project Status Notes
- The README now reflects
mcpsearch/mcpsearch_multi, not the olderscoutnaming. - Playwright is part of declared dependencies.
- Camoufox support exists in code, but is optional at install time.
- The main research direction is now orchestration, attribution, and multi-source analysis, not just single-page crawling.
Practical Next Improvements
See docs/USEFUL_LIBS.md for a curated list of libraries and implementation tricks that fit the current architecture.
Legal and Ethical Usage
Use MCPSearch responsibly.
- Respect target site policies and applicable law.
- Use rate limiting and caching to reduce load.
- Review platform terms before large-scale scraping.
- Avoid collecting or redistributing restricted personal data.
Contributing
Contribution guidance lives in CONTRIBUTING.md.
License
MIT. See LICENSE.






