token-optimization-mcp

DCx7C5/token-optimization-mcp
0 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

A fully offline MCP server for token estimation, prompt compression, model routing, and semantic caching to optimize LLM usage costs and efficiency.

README.md

token-optimization-mcp v0.2.0

Production-ready Model Context Protocol server for token counting, prompt compression, model routing and semantic caching. Zero external API calls — works fully offline.

Tools

| Tool | Description | |------------------------|-----------------------------------------------------------------------| | estimate_tokens | Count tokens for any text+model (calibrated chars/token ratios) | | compress_prompt | Shrink prompts with trim, summarize_hint or aggressive strategy | | route_model | Pick cheapest model meeting quality + context requirements | | cache_lookup | Semantic cache hit/miss by prompt or pre-computed key | | cache_store | Store prompt+result with token-savings metadata | | cache_invalidate | Remove one or all cache entries | | analyze_context | Conversation health: role breakdown, issues, recommendations | | savings_report | Session-level token/USD savings dashboard | | deduplicate_messages | Remove duplicate turns, count saved tokens |

Quick Start

cd mcps/token-optimization-mcp
uv sync

# stdio – Claude Code / Copilot
uv run main.py

# SSE – LangGraph / CrewAI / browser
uv run main.py --sse --port 8001

Environment Variables

| Variable | Default | Description | |------------------------|----------------------------|---------------------------| | USE_REDIS | false | Enable Redis backend | | REDIS_URL | redis://localhost:6379/1 | Redis connection URL | | CACHE_TTL_SECONDS | 86400 | Default cache TTL (1 day) | | RATE_LIMIT_PER_MIN | 120 | Requests/min per client | | AUDIT_LOG_ENABLED | true | Print audit log to stdout |

Registration

Claude Code (~/.claude/settings.json)

{
  "mcpServers": {
    "token-optimization": {
      "command": "uv",
      "args": ["run", "/path/to/token-optimization-mcp/main.py"]
    }
  }
}

VS Code Copilot (.vscode/mcp.json)

{
  "servers": {
    "token-optimization": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "${workspaceFolder}/mcps/token-optimization-mcp/main.py"]
    }
  }
}

SSE (LangGraph / CrewAI / Cursor)

http://127.0.0.1:8001/sse

Supported Models (routing catalogue)

| Model | Context | Quality | Cost/1k | |---------------------|---------|---------|-----------| | github:copilot | 128k | 8 | free | | gpt-4o-mini | 128k | 7 | $0.00015 | | claude-3-5-haiku | 200k | 7 | $0.00025 | | gemini-1.5-flash | 1M | 6 | $0.000075 | | gpt-4o | 128k | 9 | $0.005 | | claude-3-5-sonnet | 200k | 9 | $0.003 | | claude-3-opus | 200k | 10 | $0.015 |

Testing

uv run --group test pytest
# 118 tests, 100% coverage

Architecture

token-optimization-mcp/
├── main.py                      ← FastMCP server (9 tools)
├── pyproject.toml
├── README.md
├── tests/
│   ├── conftest.py              ← state-reset fixtures
│   ├── test_helpers.py          ← unit tests + Hypothesis
│   └── test_tools.py            ← integration tests per tool
└── mcp-servers/
    └── context-cache-server/    ← standalone Redis-backed sub-server
        ├── server.py
        ├── config.py
        └── security.py

Related MCP servers

Browse all →