Telegram Community MCP
MCP server for hybrid search over Telegram community message history. Connect it to Claude Desktop and search your chats by meaning, not just keywords.
What it does
- Hybrid search — combines full-text search (FTS5) with semantic vector search (sentence embeddings), merged via Reciprocal Rank Fusion
- MCP integration — Claude Desktop calls search tools directly, reasons over results, and pulls conversation threads for context
- Incremental sync — checkpoint-based ingestion, only fetches new messages after initial import
How it works
Claude Desktop ←→ MCP Server (stdio) ←→ SQLite (FTS5 + sqlite-vec)
←→ SentenceTransformer (embeddings)
←→ Telegram API (sync)
Search modes:
| Mode | How it works | Best for | |------|-------------|----------| | fts | SQLite FTS5 with unicode tokenization | Exact word/phrase lookup | | semantic | KNN over 384-dim embeddings (paraphrase-multilingual-MiniLM-L12-v2) | Finding messages by meaning, cross-language | | hybrid | Both FTS + semantic, merged with RRF (default) | General search — best of both worlds |
The embedding model is multilingual (50+ languages, ~120 MB) and runs on CPU. A query in Russian will find answers written in English and vice versa.
Performance
Tested on a mini PC (Intel N100, 16 GB RAM):
| Messages | DB size | FTS speed | Semantic speed | RAM usage | |----------|---------|-----------|----------------|-----------| | 100K | ~200 MB | < 50 ms | < 500 ms | ~800 MB | | 500K | ~1 GB | < 50 ms | ~1 sec | ~1.2 GB | | 1M | ~2 GB | < 50 ms | 2–5 sec | ~2 GB |
Semantic search uses a two-phase scheme: a coarse binary (Hamming) KNN over a bit[384] index ~32x smaller than the fp32 vectors, then an exact fp32 rerank of the top candidates. The small binary index stays cache-resident, which keeps the cold first-query latency low (e.g. on 1.5M vectors: cold semantic ~2 s vs ~12 s for a full fp32 scan; warm hybrid ~0.9 s). FTS5 scales to millions without issues. The binary index is built from existing vectors — no re-embedding — via python scripts/ingest.py --build-binary.
Initial ingestion of 120K messages takes ~90 minutes on CPU (embedding generation). Incremental syncs are near-instant.
Quick start
Prerequisites
- Python 3.11+
- uv package manager
1. Install
git clone https://github.com/nullnumber1/Telegram-Community-MCP.git
cd Telegram-Community-MCP
uv sync
2. Get Telegram API credentials
Go to my.telegram.org → API development tools → Create application.
Troubleshooting:
my.telegram.orgoften returns a generic ERROR when creating an app in a regular browser. This is a known issue. Try using a VPN (different regions), an antidetect browser, or a mobile browser. It may take several attempts.
Save your api_id and api_hash.
3. Configure
cp config.env.example config.env
Edit config.env: ``env TELEGRAM_API_ID=your_api_id TELEGRAM_API_HASH=your_api_hash CHAT_IDS=-1001234567890,-1009876543210 ``
To find chat IDs, run auth first, then: ``bash make chats ``
4. Authorize
make auth
Scan the QR code with Telegram (Settings → Devices → Link Desktop Device). Session is saved locally — you only need to do this once.
5. Ingest messages
make ingest
This fetches the full history of configured chats and builds the search index. Progress is printed to stdout. Safe to interrupt — resumes from the last checkpoint.
6. Connect to Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"tg-community-search": {
"command": "uv",
"args": ["run", "--project", "/absolute/path/to/Telegram-Community-MCP", "python", "server.py"]
}
}
}
Restart Claude Desktop. The search tools should appear in the tools menu.
MCP tools
| Tool | Description | Key parameters | |------|-------------|----------------| | search | Search messages across all indexed chats | query, mode (fts/semantic/hybrid), limit, chat_id, date_from, date_to | | get_context | Get surrounding thread: messages before/after + replies | message_id, window | | sync | Fetch new messages from Telegram | chat_id (optional — all chats if omitted) | | list_chats | Show indexed chats with message counts | — | | get_stats | Index statistics: totals, DB size, per-chat breakdown | — |
Project structure
├── server.py # MCP server entry point
├── src/
│ ├── db.py # SQLite: schema, CRUD, FTS5, sqlite-vec queries
│ ├── embedder.py # SentenceTransformer wrapper (lazy-loading)
│ ├── search.py # Hybrid search: FTS + KNN + RRF fusion
│ └── telegram.py # Telethon client wrapper
├── scripts/
│ ├── auth.py # One-time Telegram authorization (QR code)
│ ├── ingest.py # Full import / incremental import
│ ├── list_chats.py # List all account dialogs
│ └── monitor.py # Monitor ingestion progress
├── tests/
│ ├── test_db.py # Database operation tests
│ ├── test_embedder.py # Embedder tests
│ └── test_search.py # Search and RRF fusion tests
├── config.env.example # Configuration template
├── pyproject.toml # Dependencies and tool config
├── Makefile # Dev and deployment shortcuts
└── tg-community-search.service # systemd unit (for server deployment)
Deployment (optional)
For running on a remote server (e.g., a mini PC):
- Edit
tg-community-search.service— replaceYOUR_USERwith your username - Deploy:
make deploy REMOTE_HOST=192.168.1.42 REMOTE_USER=myuser REMOTE_PASS=mypass
- Set up hourly auto-sync via cron on the remote:
crontab -e
# Add:
0 * * * * cd /home/myuser/tg-community-search && ~/.local/bin/uv run python scripts/ingest.py >> logs/cron-sync.log 2>&1
Development
make test # Run tests
make lint # Lint and format
make dev # MCP inspector (browser UI for testing tools)
License
MIT






