token-diet 🍽️

Save tokens while coding with AI. Use less, get the same (or better) results.

When you use AI coding tools like Claude Code, Codex, or Cursor — they send your entire codebase and chat history to the AI every single time. That's expensive and actually makes the AI dumber (too much noise buries the important stuff).

token-diet fixes that. It sends a smart summary of your code instead of every file, remembers decisions instead of replaying the whole conversation, reuses cached prompts so repeated parts are nearly free, and makes the AI reply with just the changed lines instead of rewriting entire files.

The result: you spend less on tokens, and the AI gives better answers because it can focus on what matters.

---

Results

| What we measured | Result | |---|---| | 🔽 Input tokens saved | ~70% fewer tokens sent to the AI | | 🔽 Output tokens saved | ~65% fewer tokens in AI replies | | ✅ Code quality kept | ~97% of tasks still pass correctly | | ⚡ Extra time added | ~60 ms per request (barely noticeable) |

---

Architecture

flowchart TB
    subgraph Client["Your AI tool"]
        CC["Claude Code / Cowork / Codex / Cursor"]
    end

    subgraph Frontends["How you connect"]
        MCP["Slash commands (/map, /focus, etc.)<br/>Works in Claude Code, Cowork"]
        PROXY["Local proxy (localhost:8000)<br/>Works with any AI tool"]
    end

    subgraph Engine["What happens behind the scenes"]
        ASM["Prompt builder<br/>Stacks info in the smartest order"]
        BUD["Token counter<br/>Measures exact cost, cuts the fat"]
        IDX["Code scanner<br/>Reads your files, extracts structure"]
        MAP["Smart code map<br/>Ranks what's important right now"]
        MEM["Memory<br/>Remembers decisions, forgets chit-chat"]
        DOC["Doc converter<br/>Turns PDFs/Word docs into clean text"]
        PATCH["Edit applier<br/>Takes AI's changes, patches your files"]
        ROUTE["Model picker<br/>Uses cheap AI for simple tasks"]
        TEL["Cost tracker<br/>Shows how much you saved"]
    end

    subgraph Store["Saved data (per project)"]
        DB[("Code index<br/>SQLite database")]
        LOG[("Decision log<br/>What was decided & why")]
        CACHE[("Cache<br/>Avoids re-doing work")]
    end

    PROV["AI provider<br/>Anthropic / OpenAI"]

    CC -->|"your message"| MCP
    CC -->|"your message"| PROXY
    MCP --> ASM
    PROXY --> ASM
    ASM --> BUD
    ASM --> MAP
    ASM --> MEM
    ASM --> DOC
    MAP --> IDX
    IDX --- DB
    MEM --- LOG
    DOC --- CACHE
    ASM -->|"optimized prompt"| ROUTE
    ROUTE --> PROV
    PROV -->|"just the changes"| PATCH
    PATCH -->|"edits applied"| CC
    ROUTE --> TEL
    TEL --- CACHE

How it flows: Your message → token-diet builds a lean prompt (code map + decisions + just what's needed) → picks the right AI model → AI replies with only the changed lines → edits applied to your files. Every request is tracked so you can see your savings.

---

Install

Quickest way (from GitHub):

pip install 'git+https://github.com/aryxnsdfs/token-diet'

With all features:

pip install 'token-diet[all]'

For development:

git clone https://github.com/aryxnsdfs/token-diet
cd token-diet
pip install -e '.[all,dev]'

---

Quick start

cd your-project          # go to any project you're working on
ctx init                 # one-time setup: scans your code, creates config
ctx doctor               # check everything is working

Then open Claude Code in that project and press / — you'll see the commands. Type /start as your first message to begin a session.

---

Commands

| Command | What it does | |---|---| | /start | Start a coding session — scans your code, shows the overview, turns on smart mode | | /showrepo | Show the whole project — a ranked overview of your code (not every file) | | /openfile auth.py | Open a file — pulls the full file into the conversation | | /findcode MyClass | Find one function or class — shows just that piece of code | | /shortreply | Keep replies short — AI only writes the lines it's changing | | /clearchat | Free up chat space — summarizes old messages into key decisions | | /showstats | Show savings — how many tokens saved, cache hits, money saved | | /changeai local | Switch AI models — use a cheap/free local model for simple tasks | | /applychanges | Apply edits — applies the AI's code changes to your files |

---

Works with

| Tool | How to connect | |---|---| | Claude Code (terminal or VS Code) | Just run ctx init — it auto-connects. Press / | | Claude Cowork / Claude.ai | Run ctx serve --http, paste the connector config | | Codex / Cursor / any tool | Run ctx proxy --port 8000, point the tool's API URL to localhost:8000 |

See docs/INSTALL.md for detailed setup per tool.

---

How it saves tokens

| Technique | What it does | Tokens saved | |---|---|---| | Code map | Shows structure (function names, classes) instead of full files | ~86% of input | | On-demand pull | Only loads a file when you ask for it with /focus | Avoids unnecessary files | | Decision log | Remembers "we chose bcrypt for auth" instead of replaying 50 messages | ~80% of old chat | | Prompt caching | Reuses the unchanged part of the prompt (nearly free to resend) | ~10x cheaper reads | | Diff-only output | AI writes 5 changed lines, not the whole 500-line file | ~95% of output | | Model routing | Uses a small cheap model for simple tasks (summaries, commit msgs) | Uses free local AI |

---

Project structure

ctx/
  cli.py          Command line: init, serve, doctor, proxy
  registry.py     All commands defined in one place
  server.py       Connects to Claude Code (MCP protocol)
  proxy.py        Connects to any tool (local web server)
  init/           Setup scripts for each AI tool
  engine/         The brain: code scanner, map builder, memory, etc.
tests/            Automated tests to make sure nothing breaks

---

License

MIT — use it however you want.

token-diet