datoon

andrii-su/datoon
2 starsMITCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

Smart structured-data to TOON gateway: converts to TOON only when it saves LLM tokens.

README.md

<p align="center"> <img src="assets/datoon-banner.svg" width="720" /> </p>

<h1 align="center">datoon</h1>

<p align="center"> <strong>smart structured-data→TOON gateway — converts only when it actually saves tokens</strong> </p>

<p align="center"> <a href="https://github.com/andrii-su/datoon/actions/workflows/tests.yml"><img src="https://github.com/andrii-su/datoon/actions/workflows/tests.yml/badge.svg" alt="Tests"></a> <a href="https://github.com/andrii-su/datoon/actions/workflows/pre-commit.yml"><img src="https://github.com/andrii-su/datoon/actions/workflows/pre-commit.yml/badge.svg" alt="Pre-commit"></a> <a href="https://github.com/andrii-su/datoon/actions/workflows/release.yml"><img src="https://github.com/andrii-su/datoon/actions/workflows/release.yml/badge.svg" alt="Release"></a> <a href="https://pypi.org/project/datoon/"><img src="https://img.shields.io/pypi/v/datoon?color=0EA5E9" alt="PyPI"></a> <a href="https://img.shields.io/badge/python-3.12%2B-3776AB"><img src="https://img.shields.io/badge/python-3.12%2B-3776AB?logo=python&logoColor=white" alt="Python"></a> <a href="./LICENSE"><img src="https://img.shields.io/github/license/andrii-su/datoon" alt="License: MIT"></a> </p>

<p align="center"> <a href="#before--after">Before/After</a> • <a href="#install">Install</a> • <a href="#what-you-get">What You Get</a> • <a href="#how-it-works">How It Works</a> • <a href="#benchmarks">Benchmarks</a> • <a href="./INSTALL.md">Full install guide</a> </p>

______________________________________________________________________

Raw structured data is often verbose in LLM prompts. TOON can save tokens — but blind conversion can also make payloads worse. datoon adds a decision layer: convert when structure and savings justify it, skip when they don't, and always explain why.

Supports JSON, CSV, JSONL, YAML, XML, Parquet, Avro, ORC, Excel, and Apple Numbers — auto-detected from file extension.

Before / After

<table> <tr> <td width="50%">

JSON in the prompt (43 tokens)

{"users":[
  {"id":1,"name":"Ada","role":"admin"},
  {"id":2,"name":"Lin","role":"analyst"},
  {"id":3,"name":"Grace","role":"viewer"}
]}

</td> <td width="50%">

datoon converts → TOON (24 tokens)

users[3]{id,name,role}:
  1,Ada,admin
  2,Lin,analyst
  3,Grace,viewer
{"decision":"convert","reason":"Estimated savings 44.19% (threshold 15.00%)."}

</td> </tr> <tr> <td>

CSV from a data pipeline (111 tokens as JSON)

id,name,role
1,Ada,admin
2,Lin,analyst
3,Grace,viewer

</td> <td>

datoon auto-converts → TOON (24 tokens)

datoon data.csv --report-stdout

Same result. Zero JSON serialization in your code.

</td> </tr> <tr> <td>

Non-uniform payload (26 tokens)

{"config":{"debug":true},"tags":["a","b"]}

</td> <td>

datoon skips → keeps JSON

{"decision":"skip","reason":"No uniform object arrays found with at least 3 rows."}

No Node.js call. No silent corruption.

</td> </tr> </table>

Same data. Right format. Always explained.

┌──────────────────────────────────────────────────┐
│  PAYLOAD SAVINGS (auto avg)    ████░░░░░░   28%  │
│  PAYLOAD SAVINGS (agent skill) ████████░░   62%  │
│  DECISION ACCURACY             ██████████  100%  │
│  HARMFUL CONVERSIONS BLOCKED   ██████████  100%  │
└──────────────────────────────────────────────────┘

[!IMPORTANT] datoon saves payload tokens — the structured data portion of your prompt. Token savings depend on payload shape: uniform tabular data converts well; deeply nested or non-uniform structures are skipped. Every decision includes a reason so pipelines can log, debug, and trust the outcome.

Install

# core (JSON, CSV, JSONL, XML — no extra deps)
uv add datoon
pip install datoon

# with YAML support
pip install "datoon[yaml]"

# with Excel support
pip install "datoon[excel]"

# with Parquet / ORC / Avro support
pip install "datoon[columnar]"

# with Apple Numbers support
pip install "datoon[numbers]"

# with tiktoken-based token counting
pip install "datoon[tokens]"

# with MCP server
pip install "datoon[mcp]"

# everything
pip install "datoon[all]"

Requires Python 3.12+. TOON conversion requires Node.js with npx in PATH — analysis and format reading work without it.

For Claude Code plugin, Codex, and MCP config → INSTALL.md.

What You Get

| | What | |---|---| | datoon CLI | Auto-gate any supported format → TOON from terminal or scripts | | Python API | convert_json_for_llm() + read_tabular() for any LLM pipeline | | MCP Server | convert_json, convert_text, analyze_json tools for Claude Desktop, Cursor, Windsurf | | Claude Code Plugin | /datoon in-session trigger, installs from GitHub in one command | | Codex Plugin | Marketplace plugin — structured-data mode for Codex |

Supported input formats

| Format | Extension | Extra needed | |---|---|---| | JSON | .json | — | | JSONL | .jsonl, .ndjson | — | | CSV | .csv | — | | XML | .xml | — | | YAML | .yaml, .yml | datoon[yaml] | | Excel | .xlsx, .xls | datoon[excel] | | Parquet | .parquet | datoon[columnar] | | Avro | .avro | datoon[columnar] | | ORC | .orc | datoon[columnar] | | Apple Numbers | .numbers | datoon[numbers] |

How It Works

  1. Detect format — from --format flag, file extension, or default to JSON for stdin
  2. Read + normalize — parse source into list of row dicts; serialize to compact JSON
  3. Analyze structure — uniform object arrays? acceptable depth? minimum rows?
  4. Gate early — non-candidates skip before any CLI call; no Node.js overhead
  5. Convert + estimate — TOON CLI runs, token savings calculated
  6. Gate savings — below threshold → return JSON; above → return TOON with report

Every path returns a ConversionReport with decision, reason, and token estimates. Pipelines never get silent surprises.

______________________________________________________________________

Quick Start

JSON (stdin):

echo '{"users":[{"id":1,"name":"Ada"},{"id":2,"name":"Lin"},{"id":3,"name":"Grace"}]}' | datoon --report-stdout

CSV (auto-detected from extension):

datoon data.csv --report-stdout

JSONL:

datoon data.jsonl -o output.toon

YAML (requires datoon[yaml]):

datoon data.yaml --report-stdout

Parquet (requires datoon[columnar]):

datoon data.parquet --report ./report.json

Explicit format override:

datoon --format csv < data.csv --report-stdout

Force conversion (bypass gating — for experiments):

datoon data.json --force --report-stdout

______________________________________________________________________

Python API

JSON conversion:

from datoon import convert_json_for_llm, ConversionConfig, DatoonError

config = ConversionConfig(min_savings_ratio=0.15, max_depth=6, min_uniform_rows=3)

try:
    outcome = convert_json_for_llm(raw_json, config)
except DatoonError as exc:
    raise

# outcome.payload_text  — TOON or original JSON
# outcome.report.decision  — "convert" | "skip"
# outcome.report.reason    — human-readable explanation
send_to_model(outcome.payload_text)

Any format via read_tabular:

import json
from pathlib import Path
from datoon import read_tabular, convert_json_for_llm, ConversionConfig

# text formats: csv, jsonl, yaml, xml
rows = read_tabular("csv", text=csv_string)

# binary formats: excel, parquet, orc, avro, numbers
rows = read_tabular("parquet", path=Path("data.parquet"))

json_text = json.dumps(rows, separators=(",", ":"))
outcome = convert_json_for_llm(json_text, ConversionConfig())
send_to_model(outcome.payload_text)

Structure-only analysis (no Node.js required):

from datoon.analyzer import analyze_payload
from datoon.models import ConversionConfig

analysis = analyze_payload(parsed_data, ConversionConfig())
print(analysis.is_candidate, analysis.reason)

______________________________________________________________________

MCP Server

<!-- mcp-name: io.github.andrii-su/datoon -->

datoon ships an MCP server with three tools:

| Tool | Description | |---|---| | convert_json | Full JSON conversion with policy gating | | convert_text | Converts CSV, YAML, XML, or JSONL text with policy gating | | analyze_json | Structure analysis only — no Node.js needed |

Claude Desktop / Cursor / Windsurf config:

{
  "mcpServers": {
    "datoon": {
      "command": "uvx",
      "args": ["--from", "datoon[mcp]", "datoon", "mcp"]
    }
  }
}

Run locally:

datoon mcp     # or the standalone script: datoon-mcp

Listed on the MCP Registry, Smithery, and Glama. See MARKETPLACES.md.

______________________________________________________________________

Claude Code Plugin

Install directly from GitHub:

claude plugin marketplace add andrii-su/datoon
claude plugin install datoon@datoon

Trigger in-session:

/datoon
convert this JSON to TOON if it saves tokens
use datoon mode for structured data

______________________________________________________________________

CLI Reference

| Flag | Default | Description | |---|---|---| | --format | auto | Input format: json, csv, jsonl, yaml, xml, excel, parquet, avro, orc, numbers | | --force | false | Bypass gating and minimum savings threshold | | --min-savings | 0.15 | Minimum relative token savings required | | --max-depth | 6 | Maximum nesting depth for auto-conversion | | --min-uniform-rows | 3 | Minimum rows in uniform object arrays | | --timeout | 30 | Seconds before TOON CLI call is aborted | | --report <path> | — | Write JSON conversion report to file | | --report-stdout | — | Print JSON conversion report to stderr | | -o <path> | stdout | Output file path | | --version | — | Print version and exit |

Format is auto-detected from file extension. Use --format to override or when reading from stdin.

______________________________________________________________________

Benchmarks

PYTHONPATH=src python benchmarks/run.py --dry-run
PYTHONPATH=src python benchmarks/run.py
PYTHONPATH=src python benchmarks/run.py --update-readme

Why auto mode outperforms forced conversion

Auto mode avoids low-benefit and high-risk payloads (orders-nested, mixed-non-uniform) while matching forced TOON's average token count on suitable ones. Every decision comes with a reasoned report.

| Scenario | JSON Baseline | Forced TOON | datoon Auto | |---|---:|---:|---:| | Average tokens | 77 | 50 | 50 | | Avg token saved | 0.0% | 26.8% | 28.1% | | Decision quality | n/a | Converts all | Converts 3/5, skips harmful cases |

<!-- BENCHMARK-TABLE-START -->

| Dataset | JSON | TOON (forced) | Raw Saved | Auto | Auto Tokens | Auto Saved | |---|---:|---:|---:|---|---:|---:| | users-small | 56 | 31 | 44.6% | convert | 31 | 44.6% | | events-medium | 198 | 111 | 43.9% | convert | 111 | 43.9% | | orders-nested | 93 | 91 | 2.2% | skip | 93 | 0.0% | | mixed-non-uniform | 35 | 37 | -5.7% | skip | 35 | 0.0% | | metrics-wide | 133 | 63 | 52.6% | convert | 63 | 52.6% | | Average | 103 | 67 | 27.5% | 3/5 convert | 67 | 28.2% |

Forced conversion succeeded for 5/5 payloads.

<!-- BENCHMARK-TABLE-END -->

Format conversion benchmark

Token savings when converting from common structured formats (CSV, JSONL, XML, YAML). Baseline is the JSON representation of the same data — what an LLM would receive without datoon.

<!-- FORMAT-BENCHMARK-TABLE-START -->

| Dataset | Format | JSON Tokens | TOON (forced) | Auto | Auto Tokens | Auto Saved | |---|---|---:|---:|---|---:|---:| | users-csv | csv | 53 | 29 | convert | 29 | 45.3% | | events-jsonl | jsonl | 194 | 109 | convert | 109 | 43.8% | | catalog-xml | xml | 96 | 50 | convert | 50 | 47.9% | | metrics-yaml | yaml | 129 | 61 | convert | 61 | 52.7% | | Average | — | 118 | 62 | 4/4 convert | 62 | 47.4% |

Forced conversion succeeded for 4/4 payloads.

<!-- FORMAT-BENCHMARK-TABLE-END -->

Agent skill evaluation

Artifact-based subagent comparison — identical analysis tasks, two modes:

  • with_skill: agent received the datoon skill and followed the conversion workflow.
  • without_skill: agent used JSON directly, no TOON or datoon.

3 payload sizes × 3 iterations = 18 total agent runs. Both modes: 100% correct answers.

| Scenario | Avg JSON Tokens | Avg TOON Tokens | Avg Payload Saved | |---|---:|---:|---:| | small | 225 | 118 | 47.6% | | medium | 2,972 | 1,138 | 61.7% | | large | 17,757 | 6,673 | 62.4% |

Full report and raw outputs: benchmarks/agent_skill_eval/. Savings are payload-token estimates, not full end-to-end model-token usage.

______________________________________________________________________

Development

Contributor workflow: CONTRIBUTING.md. Maintainer/agent notes: CLAUDE.md.

Setup:

uv sync --extra dev
uvx pre-commit install

Tests:

pytest -m "not integration"   # unit only (102 tests)
pytest                        # with integration (requires Node.js + npx)

Skill sync + plugin metadata:

python scripts/validate_skill_sync.py
python scripts/validate_plugin_metadata.py

______________________________________________________________________

Links

______________________________________________________________________

License

MIT

Related MCP servers

Browse all →