sifter

sifter-ai/sifter
51 starsMITCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

sifter-ai/sifter MCP server](https://glama.ai/mcp/servers/sifter-ai/sifter/badges/score.svg)](https://glama.ai/mcp/servers/sifter-ai/sifter) 🐍 🏠 ☁️ - Structure any document, query it like a database.

README.md

Sifter

![CI](https://github.com/sifter-ai/sifter/actions/workflows/ci.yml) ![codecov](https://codecov.io/gh/sifter-ai/sifter) ![PyPI](https://pypi.org/project/sifter-ai/) ![npm](https://www.npmjs.com/package/@sifter-ai/sdk) ![Python](https://www.python.org/) ![Node](https://nodejs.org/) ![License: MIT](LICENSE)

Your documents are a dark database.

Open-source document intelligence engine — schema-driven extraction, NL query, MCP server, Python and TypeScript SDKs. Self-hostable under MIT.

!Sifter demo

---

Why not RAG?

RAG is built for retrieval — find me chunks similar to this query. It breaks on homogeneous collections like invoices, contracts, or receipts where every document looks alike and the question is an aggregation, not a search.

!Documents to structured records

Sifter's approach: extract structured fields once (client, date, total), store them as typed records, query with real filters and aggregations. The answer is exact and reproducible — because it's a database query, not a similarity search.

---

Quickstart

git clone https://github.com/sifter-ai/sifter
cd sifter/code
cp server/.env.example server/.env.local    # set SIFTER_DEFAULT_API_KEY (required)
docker compose up -d

Open http://localhost:3000 — create a sift, upload documents, query results.

---

Python SDK

pip install sifter-ai
from sifter import Sifter

s = Sifter(api_key="sk-...")

sift = s.create_sift("Invoices", "client name, date, total amount")
sift.upload("./invoices/")
sift.wait()

for record in sift.records():
    print(record["extracted_data"])
# {"client": "Acme Corp", "date": "2024-01-15", "total_amount": 1500.0}

TypeScript SDK

npm install @sifter-ai/sdk
import { Sifter } from "@sifter-ai/sdk";

const client = new Sifter({ apiKey: "sk-..." });

const sift = await client.createSift("Invoices", "client, date, total amount");
await sift.upload("./invoices/");
await sift.wait();

const records = await sift.records();
console.log(records);

---

MCP server (Claude Desktop / Cursor / AI agents)

{
  "mcpServers": {
    "sifter": {
      "command": "uvx",
      "args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
      "env": { "SIFTER_API_KEY": "sk-dev" }
    }
  }
}

Then ask:

"What's the total unpaid across all invoices from last quarter?" "Show me all contracts expiring in the next 90 days." "Which candidates have Python and more than 5 years experience?"

Sifter answers with structured data — exact counts, sums, filtered rows. Not a text blob.

Want a remote MCP URL without running a local server? → Sifter Cloud

---

Dashboard

Sifter includes a built-in dashboard — no Metabase, no Grafana, no SQL required.

Describe what you want to see in plain language:

sift = client.sifts.get("invoices")
sift.create_dashboard("Show total invoiced and unpaid by vendor, monthly trend")

Produces KPI tiles, breakdowns, and time-series — updated automatically on every extraction.

---

What's included

  • Schema-driven extraction — describe what to extract in natural language; schema is inferred automatically and exported as Pydantic / TypeScript types
  • NL query — ask questions in plain language; Sifter generates inspectable MongoDB aggregation pipelines
  • MCP server — stdio transport, read + write tools, zero custom integration code
  • REST API + SDKs — full OpenAPI spec, typed clients for Python and TypeScript
  • Webhooks — HMAC-signed HTTP callbacks on every extraction event
  • Spec-driven dashboards — short NL spec → auto-generated board (KPI, breakdown, table, time series)
  • CLIsifter extract, sifter records, sifter sifts for terminal workflows and CI
  • Self-hostable — Docker Compose, bring your own MongoDB and LLM API key

---

Don't want to run infrastructure?

Sifter Cloud is the managed version — no Mongo, no ops, remote MCP endpoint, Google Drive and email ingress. Free tier available.

---

Docs

Full documentation at docs.sifter.run — quickstart, SDK reference, MCP guide, cookbook, self-hosting.

---

License

MIT — see LICENSE.

Created by Bruno Fortunato.

Related MCP servers

Browse all →