tablebridge

Turn a folder of CSV / Parquet / JSON files into one SQL-queryable source for your AI agent.

![CI](https://github.com/Michael-WhiteCapData/tablebridge-mcp/actions/workflows/ci.yml) ![PyPI](https://pypi.org/project/tablebridge/) ![Python](https://www.python.org/) ![MCP](https://modelcontextprotocol.io/) ![License: MIT](LICENSE)

Small businesses don't have a data warehouse — they have a folder full of exports: customers.csv, last month's orders.xlsx, a regions.json someone emailed over. tablebridge is an MCP server that points DuckDB at that folder, exposes each file as a SQL table, and lets your agent run read-only SQL — including JOINs across files — to answer questions over all of them at once. Scattered spreadsheets become one queryable source of truth.

It's read-only and sandboxed: files are loaded into an in-memory database, the data directory is the only thing it can see, and queries are validated so an agent can't write, escape to other paths, or call raw file functions.

---

Why you'd want this

🔗 One source over many files. JOIN orders.csv to customers.csv to regions.json in a single query — no ETL, no database to stand up.
🦆 DuckDB-powered. Fast analytical SQL over CSV, TSV, Parquet, JSON/NDJSON.
🔒 Safe by design. Files are materialized into memory; queries are validated read-only; raw file-access functions and out-of-sandbox paths are rejected.
🤖 Agent-friendly. list_sources → describe → query is a natural flow the agent can follow on its own.
🪶 Two dependencies (mcp, duckdb), fully typed and tested.

Install

uvx tablebridge          # run directly
# or
pip install tablebridge  # then run: tablebridge

Claude Code

TABLEBRIDGE_DATA_DIR=/path/to/your/data claude mcp add tablebridge -- uvx tablebridge

Claude Desktop / Cursor

{
  "mcpServers": {
    "tablebridge": {
      "command": "uvx",
      "args": ["tablebridge"],
      "env": { "TABLEBRIDGE_DATA_DIR": "/path/to/your/data" }
    }
  }
}

Run with Docker

A Dockerfile is included. The server speaks MCP over stdio. Mount the folder you want to query at /data (read-only is fine) and run interactively (-i):

docker build -t tablebridge .
docker run --rm -i -v /path/to/your/data:/data:ro tablebridge

Tools

| Tool | Description | | --- | --- | | list_sources | List the tables (one per data file) with column counts — start here | | describe | A table's columns and types | | preview | First N rows of a table | | query | Run read-only SQL (DuckDB dialect) across the tables, JOINs included | | refresh | Re-scan the data directory for added/changed files | | server_info | Effective config (data dir, row cap, supported formats) |

Example

With a folder containing customers.csv, orders.csv, and regions.json:

You: Who are my top 3 customers by total spend, and what region are they in? Agent: (calls list_sources, then query) ``sql SELECT c.name, r.region, SUM(o.total) AS spend FROM customers c JOIN orders o ON o.customer_id = c.id JOIN regions r ON r.customer_id = c.id GROUP BY c.name, r.region ORDER BY spend DESC LIMIT 3; ``

Configuration

| Variable | Default | Description | | --- | --- | --- | | TABLEBRIDGE_DATA_DIR | . | Directory of files to expose (the sandbox boundary) | | TABLEBRIDGE_MAX_ROWS | 1000 | Max rows returned per query/preview | | TABLEBRIDGE_RECURSIVE | 1 | Scan subdirectories too |

Supported formats: .csv, .tsv, .parquet, .json, .ndjson.

Security model

Sandboxed to TABLEBRIDGE_DATA_DIR — only files under it are loaded.
Materialized into an in-memory DuckDB, then external filesystem access is disabled — queries can't reach other paths.
Validated SQL — a single read-only statement only; writes and raw file-reader functions are rejected.

Development

git clone https://github.com/Michael-WhiteCapData/tablebridge-mcp
cd tablebridge-mcp
uv pip install -e ".[dev]"
ruff check .
pytest          # uses real DuckDB over temp files

See CONTRIBUTING.md.

tablebridge-mcp