<!-- mcp-name: io.github.JacobBruce/CAIT -->
CAIT - Core AI Toolkit
A modular Model Context Protocol (MCP) server that extends AI assistants with practical capabilities: file I/O, a persistent Python REPL, AST-aware code analysis, semantic text search, document conversion, Wikipedia & arXiv tools, a persistent vector memory database, and other general utilities.
A total of 38 tools across 9 modules. Each module can be disabled independently via the CAIT_DISABLE environment variable. Made by AI for AI.
Requirements
- Python 3.11+
- Core:
fastmcp,chromadb - Online research:
wikipedia-api,arxiv - Document conversion:
doclingormarkitdown[all](ormarkitdown[pdf]for PDF-only) - Scientific computing (optional, for REPL use):
sympy,scipy,matplotlib,plotly,vispy
Installation
git clone https://github.com/JacobBruce/CAIT
cd CAIT
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
Replace MarkItDown with Docling in requirements.txt if you want higher-quality layout-aware conversion (slower, heavier).
Agents & Skills
CAIT includes several agent files, instructions/rules, and skills. To get the most from CAIT you should install AGENTS.md into your work environment (see Agent Instructions below).
The instructions include general guidance for how to behave, how to use CAIT tools, and how to use the Firecrawl search tools. The instructions may need to be adapted to suit different setups.
If you are working in a Python environment you may want to make use of this agent prompt: python-coder.agent.md. There is also research-assistant.agent.md for deep research.
For C++ programmers there is cpp-style-guidelines.md, although many of the guidelines are my personal preferences and may need to be adapted to other projects.
For project planning and onboarding, CAIT also includes these two complementary skills:
- project-survey.md — orient in an unfamiliar codebase; produces
SURVEY.md - project-planning.md — plan new features or roadmaps; produces
PLAN.mdandTASKS.md
New: CAIT now includes game-master.agent.md to help agents act as the Game Master of a roleplaying text-based adventure, with optional image generation support.
It contains detailed "Game Master Protocols" with a well thought out Markdown file system for maintaining the state of a roleplay world. There is also an accompanying skill to help the agent setup a new roleplay world.
Agent Instructions
To use the agent instructions, rename AGENTS.md and place it in the correct location:
| Tool | Where to put a copy | |------|---------------------| | Cursor | User rule, or AGENTS.md at project root | | Claude Code | CLAUDE.md at project root (or ~/.claude/CLAUDE.md globally) | | GitHub Copilot | .github/copilot-instructions.md | | Other | AGENTS.md at project root — increasingly recognized across tools |
Environment Variables
| Variable | Default | Description | |----------|---------|-------------| | CAIT_FILES_PATH | ~/.cait/files/ | Directory for downloaded files and document conversion cache | | CAIT_MEMORY_PATH | ~/.cait/memory | ChromaDB storage for the persistent memory database | | CAIT_DISABLE | _(empty)_ | Comma-separated module names to exclude at startup (e.g. wiki,arxiv) |
Client Configuration
VS Code (GitHub Copilot)
Add to your workspace .vscode/mcp.json or user settings.json:
{
"servers": {
"bitfreak/cait": {
"type": "stdio",
"command": "/absolute/path/to/.venv/bin/python",
"args": ["-m", "cait.server"],
"cwd": "/absolute/path/to/CAIT"
}
}
}
For user
settings.json, nest the above under"mcp": { ... }.
Copy AGENTS.md to .github/copilot-instructions.md in your project.
Claude Desktop
Edit claude_desktop_config.json:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Linux:
~/.config/claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"bitfreak/cait": {
"command": "/absolute/path/to/.venv/bin/python",
"args": ["-m", "cait.server"],
"env": {
"PYTHONPATH": "/absolute/path/to/CAIT"
}
}
}
}
Claude Code
claude mcp add cait -e PYTHONPATH=/absolute/path/to/CAIT \
-- /absolute/path/to/.venv/bin/python -m cait.server
Copy AGENTS.md to your project root as CLAUDE.md (or ~/.claude/CLAUDE.md for global use).
Cursor
Add to your user ~/.cursor/mcp.json (or project .cursor/mcp.json):
{
"mcpServers": {
"cait": {
"command": "/absolute/path/to/CAIT/.venv/bin/python",
"args": ["-m", "cait.server"],
"cwd": "/absolute/path/to/CAIT",
"env": {
"PYTHONPATH": "/absolute/path/to/CAIT"
}
}
}
}
PYTHONPATH must point at the CAIT repo root (the directory that contains the cait/ package). Cursor does not always honor cwd for MCP subprocesses, so PYTHONPATH is required for python -m cait.server to resolve.
Add AGENTS.md as a user or project rule (Settings → Rules, Skills, Subagents), or copy it to AGENTS.md in the project root.
Recommended MCP Servers
Firecrawl
Firecrawl is a web scraping and search API that pairs naturally with CAIT, adding powerful web search, full-page scraping, and site crawling. A free API key is available at firecrawl.dev.
Serena
Serena provides many tools for semantic code retrieval and editing. Both CAIT and Serena include a similar memory system so it is recommended to disable one of them.
codebase-memory-mcp
codebase-memory-mcp indexes a repository into a persistent knowledge graph (functions, classes, call chains, HTTP routes, packages) and exposes structural queries over MCP.
Tool Reference
File System — fs
| Tool | Description | |------|-------------| | get_file_info | Metadata for a single file: size, line count, permissions, timestamps. Does not read content. | | get_dir_info | Directory listing with per-entry metadata. Supports glob patterns and recursion. | | read_file | Read a text file with a max_bytes cap and lineno\|text prefixes. Slice mode: offset + limit (negative limit = tail, e.g. -50). Search mode: pattern with context lines around hits (in-file grep). | | write_file | Write text to a file. mode='append' (default) or 'replace'. Useful for NOTES.md, TASKS.md, log files. | | download_file | Download a URL to ~/.cait/files/ (or CAIT_FILES_PATH). Returns the local path. | | fetch_url | HTTP GET/POST with custom headers and body. Use save_to to avoid large responses in context. convert=True returns clean markdown via Docling or MarkItDown. |
Persistent Python REPL — repl
Security:
repl_execruns arbitrary Python as the same OS user as the MCP server, with full filesystem and network access. Only enable the REPL module in environments you trust.
| Tool | Description | |------|-------------| | repl_exec | Execute Python code in a persistent session. Variables, imports, and function definitions survive between calls. Returns stdout, stderr, and exception info. | | repl_read | Inspect a named variable from the REPL session without executing code. Returns repr, type, and JSON value. | | repl_vars | List all user-defined variables in the current REPL session. Returns name, type, repr, and JSON value for each. Useful for reviewing session state without running code. | | repl_reset | Clear all variables and imports from the REPL session. |
Code Analysis — code
All code tools perform AST-aware search — they skip occurrences in comments and strings, unlike text grep.
| Tool | Description | |------|-------------| | find_definitions | Find all definitions of a function, class, or variable. Returns file, line, docstring, and kind. | | find_calls | Find all call sites of a function. Matches bare calls, method calls, and chained calls. | | find_imports | Find all files that import a given module or name. | | find_references | Find all uses of an identifier (loads, stores, deletes, attribute accesses). |
Text Search & Embeddings — text
Uses all-MiniLM-L6-v2 (bundled with ChromaDB — no separate download). Chunk embeddings are cached in memory so repeated queries on the same document skip re-embedding.
| Tool | Description | |------|-------------| | search_text | Semantically search or summarize a text string or plain text file (.txt, .md, .rst). Query given → extract mode (most relevant chunks). Query empty → summarize mode (most representative chunks). | | encode_text | Return raw 384-dimensional float embeddings for one or more strings or files. | | text_similarity | Cosine similarity between two texts (0–1). | | diff_text | Unified diff between two strings or files. Returns diff text plus added/removed line counts. |
Document Tools — document
| Tool | Description | |------|-------------| | convert_doc | Convert PDF, DOCX, PPTX, XLSX, HTML, LaTeX, images, audio, and more to markdown or plain text. Backends: docling (higher quality, layout-aware), markitdown (lighter, better for Office files), auto (tries docling, falls back to markitdown). Use save_to to write large outputs to a file. strip_tables=True removes noisy pipe-table syntax. rich_pdf=True enables Docling's code detection and formula extraction (slower). | | search_doc | Same as search_text but handles many document formats (PDF, DOCX, HTML, URLs). Converts via convert_doc on first call and caches the result — repeat calls are instant. |
Wikipedia — wiki
| Tool | Description | |------|-------------| | wiki_search | Search Wikipedia. Returns titles, snippets, word counts, and URLs. | | wiki_sections | List all sections of a page as a table of contents (no text). | | wiki_section | Get the text of a specific section. Use wiki_sections first to find section titles. | | wiki_page | Get full page text or just the summary (summary_only=True). Supports non-English via language parameter. |
arXiv — arxiv
| Tool | Description | |------|-------------| | arxiv_search | Search arXiv. Supports field prefixes (ti:, au:, abs:, cat:) and boolean operators. Returns metadata for up to 100 papers. | | arxiv_paper | Fetch a paper by ID. full_text=False (default) returns abstract + metadata. full_text=True downloads and converts the full PDF. Use save_to for large outputs. |
Datetime & Utilities — utils
| Tool | Description | |------|-------------| | get_datetime | Current date, time, timezone, UTC offset, weekday, and Unix timestamp. Accepts any IANA timezone name. | | timer_start | Start a named wall-clock timer. | | timer_stop | Stop a timer and return elapsed seconds. | | timer_list | List all running timers and their current elapsed time. |
Memory Database — memory
Persistent ChromaDB vector store at ~/.cait/memory (override with CAIT_MEMORY_PATH; shared across projects). Content is embedded with all-MiniLM-L6-v2 for semantic retrieval.
| Tool | Description | |------|-------------| | mem_add | Add a new entry. Fields: title, content (embedded), tags, description, source, entry_id. | | mem_search | Find entries by semantic similarity to a query. Optionally filter by tags. | | mem_get | Retrieve a full entry by ID. | | mem_list | List entries sorted by date (newest first). Content omitted for brevity. | | mem_set | Update fields of an existing entry. Only non-empty values are applied. | | mem_edit | Edits content in-place — regex replace when pattern is given, or append when not. | | mem_delete | Permanently delete an entry by ID. | | mem_find | Fast metadata scan — no embedding. Match by title substring, exact source URL, or tags. Use this for deduplication checks before mem_add. |
Disabling Modules
Set CAIT_DISABLE to a comma-separated list of module names to exclude their tools at startup:
CAIT_DISABLE=wiki,arxiv python -m cait.server
Available module names: fs, text, code, repl, wiki, arxiv, utils, memory, document






