S.C.R.U.B.

zombat/scrub-mcp
1 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

A code quality MCP server with 22 tools for deterministic linting, formatting, security scanning, and AI-assisted improvements, designed to reduce cloud LLM costs and enforce best practices.

README.md

S.C.R.U.B.

Source Code Review, Uplift, and Baselining

A 22-tool MCP server and CI-ready CLI that cuts cloud LLM token usage on code quality tasks. Deterministic tools handle what they can. A local LLM (via DSPy) handles the rest. Cloud models plan and review. Nothing else.

Cloud LLM (plan) ──> S.C.R.U.B. MCP Server ──> Cloud LLM (review)
                          │
              ┌───────────┼───────────┐
              ▼           ▼           ▼
        Deterministic   DSPy +     Security +
        (Ruff, AST,    Local LLM   Supply Chain
         pyright)     (Qwen Coder)  (Bandit, OSV)

              ┌───────────┴───────────┐
              ▼                       ▼
         CLI (scrub)            GitHub Actions
     check│fix│diff│audit      .pre-commit-hooks
         SARIF 2.1.0 output

Why: From Code Generation to Code Governance

The AI industry is obsessed with Day 1: writing code faster. But for any team responsible for enterprise infrastructure, the real cost of software isn't writing it—it's Day 2: maintaining, securing, and auditing that code for the next five years.

AI makes it dangerously easy to generate massive amounts of technical debt, complex spaghetti logic, and unvetted supply chain risks. It writes code, but it doesn't own systems.

S.C.R.U.B. is the adult in the room. It is not a coding tool; it is a best practices tool. It acts as a deterministic governance engine that sits between the AI and your codebase, strictly enforcing sanity, security thresholds, and architectural standards. It allows teams to safely adopt autonomous coding agents because the output is systematically proven to be compliant, legible, and secure before it ever hits a pull request.

The fact that S.C.R.U.B. forces CycloneDX SBOMs, Bandit security scans, and strict cyclomatic complexity limits isn't just a side feature—it's the product. The conversation shifts from "How fast can the AI code?" to "How safe is the code the AI just wrote?"

This deterministic-first approach has a powerful side effect: it dramatically cuts cloud LLM costs. Boilerplate tasks like docstrings, type annotations, and linting are handled locally, where compute is virtually free and quality is consistent. Your cloud models are reserved for high-level planning and review, not churning out commodity code.

Architecture

Deterministic-first. Every task hits deterministic tools before the LLM sees it. Ruff handles linting. pyright validates types. pydocstyle checks docstring style. AST analysis computes complexity. Bandit scans for vulnerabilities. If the deterministic tool says the code already passes, the LLM never fires. Zero tokens spent.

Three-tier pre-filter. Tier 1 (AST): is the docstring/annotation physically present? Tier 2 (pydocstyle/pyright): does the existing artifact pass quality checks? Tier 3 (only failures): send to the local LLM. Each tier is gated to its step. Ask for --steps lint and no pre-filter runs for docstrings.

Batched DSPy calls. Instead of one LLM call per function, S.C.R.U.B. packs 5 functions into a single prompt (configurable batch_size). A file with 30 functions goes from 60 round trips to 12.

Teacher-student optimization. Use Claude (or any large model) as a teacher to compile optimized prompts for your local student model. One-time cloud cost during setup. Zero cloud cost at runtime. The compiled prompts live in .dspy_cache/ and the health checker flags when they go stale.

Structural laziness enforcement. Tool descriptions are written as imperatives ("MUST use this tool", "Do NOT write manually") so the cloud LLM routes through S.C.R.U.B. instead of generating boilerplate itself. AGENTS.md establishes the division of labor before the first prompt.

The 22 Tools

Code Hygiene

| Tool | LLM? | What it does | |------|------|-------------| | lint_file | No | Ruff lint + autofix | | generate_docstrings | Batched | Google-style, all files/classes/methods/functions | | annotate_types | Batched | All args + all returns, no exceptions | | add_comments | Gated | Only fires on complex functions (cyclomatic/cognitive threshold) | | hygiene_full | Mixed | Full pipeline: lint, docstrings, types, comments | | hygiene_batch | Mixed | hygiene_full across multiple files in parallel | | hygiene_incremental | Mixed | Diff-aware, cache-accelerated hygiene (skips unchanged functions) |

Coding Tools

| Tool | LLM? | What it does | |------|------|-------------| | analyze_complexity | No | Cyclomatic + cognitive complexity, nesting depth, hotspots | | suggest_simplifications | Yes | Concrete refactoring: early returns, guard clauses, extract function | | optimize_imports | Mixed | Ruff removes unused, DSPy infers missing imports | | generate_tests | Batched | pytest stubs: happy path, edge cases, parametrize | | run_tests | No | Run pytest with PYTHONPATH=src, returns exit code + output | | find_dead_code | No | Unreachable code, unused vars, redundant else, commented blocks | | suggest_refactoring | Yes | Extract function candidates, rename suggestions |

Exploration

| Tool | LLM? | What it does | |------|------|-------------| | explore_architecture | No | AST skeleton: signatures + docstrings, bodies replaced with ... | | read_files | No | Batch-read multiple files in one call | | find_symbols | No | Extract function/class signatures via AST | | grep_multi | No | Multi-pattern regex search across the codebase |

Security + Supply Chain

| Tool | LLM? | What it does | |------|------|-------------| | security_scan | No | Bandit static analysis: hardcoded secrets, injection, weak crypto | | security_remediate | Yes | Triage-first: rewrite, # nosec with justification, or accept risk | | generate_sbom | No | CycloneDX 1.5 / SPDX 2.3 from pip, pyproject, lock files | | scan_vulnerabilities | No | Cross-reference PURLs against OSV.dev (PyPI, GHSA, NVD) |

Quick Start

Install

# From GitHub
pip install "scrub-mcp @ git+https://github.com/zombat/scrub-mcp.git"
pip install "scrub-mcp[all] @ git+https://github.com/zombat/scrub-mcp.git"

# From PyPI (when published)
pip install scrub-mcp
pip install scrub-mcp[all]          # everything

# Extras
pip install scrub-mcp[security]     # adds Bandit
pip install scrub-mcp[prefilter]    # adds pyright + pydocstyle

CLI

S.C.R.U.B. ships a scrub CLI for CI pipelines, pre-commit hooks, and local use. No LLM required for check and audit.

# Detect violations (deterministic, no LLM)
scrub check src/ --fail-on missing-docstrings,missing-types,complexity:10,security:MEDIUM

# SARIF output for GitHub Code Scanning
scrub check src/ --fail-on missing-docstrings --format sarif --output results.sarif

# Diff-aware (only files changed since main)
scrub check . --since main --fail-on missing-docstrings,missing-types

# Auto-fix with commit (requires local LLM)
scrub fix src/ --steps docstrings types --commit

# Preview what fix would change
scrub diff src/ --format unified

# Security + supply-chain audit
scrub audit . --fail-on-severity HIGH --sbom-format cyclonedx --output-sbom sbom.json

# Cache management
scrub cache stats
scrub cache clear --stale
scrub cache warm src/

scrub check exit codes

| Code | Meaning | |------|---------| | 0 | Clean (or --fail-on not specified) | | 1 | Violations found matching --fail-on criteria |

--fail-on options

| Token | What it checks | |-------|---------------| | missing-docstrings | Functions, classes, and modules without docstrings | | missing-types | Functions without type annotations | | complexity:N | Functions with cyclomatic complexity >= N | | security:SEVERITY | Bandit findings at or above severity (low, medium, high) | | vulns:SEVERITY | OSV.dev vulnerabilities at or above severity (low, medium, high, critical) |

Output formats

All commands that produce reports support --format text|json|sarif. SARIF 2.1.0 output maps directly to GitHub Code Scanning — each violation becomes a SARIF result with location, message, and rule ID (e.g. SCRUB-DOC-001, SCRUB-SEC-B101, SCRUB-VULN-CVE-2024-1234).

GitHub Actions

Add S.C.R.U.B. as a quality gate in your CI pipeline:

# .github/workflows/scrub.yml
name: Code Quality
on: [push, pull_request]

jobs:
  scrub:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: zombat/scrub-mcp@v1
        with:
          mode: check
          fail-on: "missing-docstrings,missing-types,security:MEDIUM"
          format: sarif

The composite action installs S.C.R.U.B., runs the selected mode, and uploads SARIF to GitHub Code Scanning automatically. Supported modes: check, fix, diff, audit.

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/zombat/scrub-mcp
    rev: v1
    hooks:
      - id: scrub-check          # docstrings + types
      - id: scrub-security       # Bandit at MEDIUM severity

Fast because diff-aware (--since HEAD) combined with the artifact cache means only staged changes are analyzed.

LLM Considerations

When using Ollama, how a model fits into your memory determines how it performs:

  • Fully in VRAM: Blazing fast (30–50+ tokens/second). Perfect for real-time chat and IDE tab-autocomplete.
  • Split between GPU and System RAM: Noticeably slower (2–10 tokens/second), as your CPU has to process the overflow. Great for complex problems where you don't mind waiting a few seconds.

*1. The Sweet Spot: Best Overall (Fits entirely in VRAM)***

If you want a model that is smart, handles complex logic, and replies instantly, you want a 7B–8B parameter model.

  • Top Pick: Qwen 2.5 Coder (7B)
  • Why: It is currently the undisputed heavyweight champion of small coding models. It punches way above its weight, routinely beating many older 30B+ models on benchmarks. At 4.7GB (with Ollama's default quantization), it fits entirely in a 4060 8GB GPU while leaving over 3GB of VRAM open for a massive context window (perfect for pasting in multiple large files).
  • Command: ollama run qwen2.5-coder:7b
  • Runner Up: Llama 3.1 (8B)
  • Why: A fantastic generalist. If you want a model that writes good code but is also excellent at writing documentation, drafting emails, or brainstorming architecture, this is highly reliable.
  • Command: ollama run llama3.1

*2. The Autocomplete Champs (For IDE Integration)***

If you are using an IDE extension (like Continue.dev, Cline, or Roo) for background "tab-autocomplete" as you type, you need a model that can generate tokens in milliseconds. The 7B models can sometimes be just a fraction of a second too slow for a seamless typing experience.

  • Top Pick: Qwen 2.5 Coder (1.5B or 3B)
  • Why: It takes up barely any VRAM (~1GB to 2GB), leaving your GPU free for your OS, browser, or running a secondary chat model simultaneously. It is trained specifically for Fill-In-the-Middle (FIM) tasks, making it lightning-fast for predicting the next few lines of code.
  • Command: ollama run qwen2.5-coder:1.5b

*3. The Heavy Lifters (64GB+ System RAM)***

If you have 64GB+ of system RAM, you can comfortably run massive, frontier-level models. Ollama will automatically load as much of the model as possible to the GPU and seamlessly offload the rest to your CPU. It will be slower, but it will be incredibly smart.

  • Top Pick: Qwen 2.5 Coder (32B)
  • Why: This model rivals proprietary cloud models like GPT-4o and Claude 3.5 Sonnet on coding benchmarks. It's a ~20GB file, so it will lean heavily on your system RAM. Pull this out when you have a difficult architectural problem or an obscure bug, and are willing to wait 10–20 seconds for the code generation to finish.
  • Command: ollama run qwen2.5-coder:32b
  • Runner Up: DeepSeek R1 (14B)
  • Why: This is a "reasoning" model (similar to OpenAI's o1). It will generate an internal chain-of-thought to "think" through complex logic problems before outputting the final code. At 14B, it will only spill over your VRAM slightly (taking ~9GB total), meaning your CPU won't bottleneck it too severely.
  • Command: ollama run deepseek-r1:14b

Configure

# config.yaml
model:
  provider: ollama
  model: qwen2.5-coder:14b
  base_url: http://localhost:11434
  max_tokens: 4096
  temperature: 0.1

batch_size: 5
deterministic_prefilter: true

Connect to MCP Clients

The --install-mcp flag auto-generates or updates the correct MCP config file for your client:

# Claude Code — writes .mcp.json in the project root
python -m scrub_mcp.mcp.server --install-mcp claude-code

# Cursor — writes .cursor/mcp.json in the project root
python -m scrub_mcp.mcp.server --install-mcp cursor

# GitHub Copilot (VS Code) — writes .vscode/mcp.json
python -m scrub_mcp.mcp.server --install-mcp github-copilot

# Copilot CLI — writes ~/.copilot/mcp-config.json (global)
python -m scrub_mcp.mcp.server --install-mcp copilot-cli

# Gemini Code Assist — writes ~/.gemini/settings.json (global)
python -m scrub_mcp.mcp.server --install-mcp gemini

# Windsurf (Cascade) — writes ~/.codeium/windsurf/mcp_config.json (global)
python -m scrub_mcp.mcp.server --install-mcp windsurf

# Cline / Roo Code — writes .vscode/mcp.json
python -m scrub_mcp.mcp.server --install-mcp cline

# Zed — writes ~/.config/zed/settings.json (global)
python -m scrub_mcp.mcp.server --install-mcp zed

Each command detects the running Python interpreter and merges the S.C.R.U.B. server entry into any existing config without overwriting other servers.

Or configure manually — S.C.R.U.B. speaks MCP over stdio, so any MCP client works:

{
  "mcpServers": {
    "scrub": {
      "command": "python",
      "args": ["-m", "scrub_mcp.mcp.server"]
    }
  }
}

Note: Zed uses a different schema (context_servers with a nested command object). Use --install-mcp zed to generate the correct format automatically.

Drop agent instructions

Tell the cloud LLM to orchestrate via tools instead of writing boilerplate itself:

python -m scrub_mcp.mcp.server --agent-instructions .

Writes AGENTS.md into the current directory with a prime directive, division-of-labor table, mandatory workflow checklists (hygiene before every file, security before every commit), and a bouncer rule that intercepts manual docstring/type/test generation. Commit it alongside .mcp.json.

Optimization

Training examples ship with the package (scrub_mcp/examples/). The optimizer runs out of the box — no --examples-dir required.

Self-teach (local only, free)

python -m scrub_mcp.optimizers.tune

Teacher-student (one-time cloud cost, better results)

python -m scrub_mcp.optimizers.tune --teacher --teacher-model claude-sonnet-4-20250514

Build + tune in one shot

Generate fresh training examples using Claude, then immediately optimize on them:

export ANTHROPIC_API_KEY=sk-ant-...

# Offline student after building
python -m scrub_mcp.optimizers.tune --build-examples ./examples --build-count 10

# Teacher-student after building
python -m scrub_mcp.optimizers.tune --build-examples ./examples --build-count 10 \
    --teacher --teacher-model claude-sonnet-4-20250514

Each topic generates three files: {topic}_messy.py (undocumented input), {topic}_clean.py (fully annotated ground truth), and {topic}_test.py (pytest tests). Commit them to src/scrub_mcp/examples/ to bundle with the package.

Per-module strategy

| Module | Strategy | Rationale | |--------|----------|-----------| | docstrings, types, comments | BootstrapFewShot | Style matching, Qwen handles it well | | imports, dead_code | BootstrapFewShotWithRandomSearch | Needs variety in examples | | complexity, tests, refactoring | MIPROv2 | Prompt wording drives output quality |

Calibrated LLM-as-judge

MIPROv2 modules use a two-tier metric: 40% structural checks (JSON valid, required fields, style markers) + 60% teacher-as-judge (Claude evaluates Qwen's output against calibrated rubrics with grounded score anchors). Edit .dspy_cache/judge_calibration.json to tune the judge to your standards.

Cache health

# Quick check
python -m scrub_mcp.optimizers.health

# CI gate (non-zero exit if stale)
python -m scrub_mcp.optimizers.health --threshold 0.7

# Just the heavy modules
python -m scrub_mcp.optimizers.health --modules complexity,tests

Model fingerprints detect drift when you upgrade Qwen. The health checker tells you exactly which modules to recompile.

Project Structure

action.yml                 # GitHub Actions composite action
.pre-commit-hooks.yaml     # Pre-commit hook definitions (scrub-check, scrub-security)
config.yaml                # Default pipeline configuration
src/scrub_mcp/
  cli.py                   # CLI entrypoint: check, fix, diff, audit, cache
  config.py                # Pydantic config: model, ruff, optimizer, batch, prefilter
  models.py                # All I/O models (Pydantic)
  pipeline.py              # Orchestrator: deterministic-first, batched, gated
  examples/                # Bundled training examples (ships with pip install)
    {topic}_messy.py       # Undocumented input
    {topic}_clean.py       # Fully annotated ground truth
    {topic}_test.py        # pytest tests
  mcp/
    server.py              # 22-tool MCP server (stdio) + --agent-instructions
  modules/
    signatures.py          # DSPy signatures: hygiene tasks
    hygiene.py             # DSPy modules: docstrings, types, comments (single + batch)
    coding_signatures.py   # DSPy signatures: coding + security tasks
    coding_tools.py        # DSPy modules: complexity, tests, refactoring, security triage
  tools/
    parser.py              # AST extraction: functions, classes, modules, complexity
    linter.py              # Ruff wrapper
    rewriter.py            # Source rewriter: apply docstrings, types back to source
    sarif.py               # SARIF 2.1.0 serializer for check/audit output
    cache.py               # 3-layer composite hash artifact cache
    diff.py                # Unified diff parser + function narrowing
    fs.py                  # Gitignore-aware file traversal
    complexity.py          # Cyclomatic + cognitive complexity analyzer
    dead_code.py           # Unreachable code, unused vars, commented blocks
    imports.py             # Import optimizer (Ruff + AST)
    security.py            # Bandit wrapper
    sbom.py                # CycloneDX 1.5 / SPDX 2.3 generator
    vulnscan.py            # OSV.dev vulnerability scanner
    savings.py             # Cloud cost savings estimator
  optimizers/
    tune.py                # Per-module optimizer: teacher-student, calibrated judge
    health.py              # Cache staleness detection + model fingerprinting
    examples_gen.py        # Training example generator (Claude API)
scripts/
  gen_examples.py          # CLI wrapper around examples_gen (dev convenience)

The ROI

By inserting S.C.R.U.B. into the Claude Code (or any MCP client) workflow, you cut cloud token usage on every generation loop. The local pipeline doesn't get lazy on function 47. It doesn't skip the boring parts to save tokens. It runs the same quality pass on every function, every class, every file.

The CLI layer (scrub check, scrub audit) enforces the same standards in CI without needing an LLM at all. SARIF output feeds directly into GitHub Code Scanning, so violations show up as inline annotations on pull requests. Pre-commit hooks catch issues before they reach the pipeline.

What comes out the other side isn't generic boilerplate. It's a fully documented, typed, linted, security-scanned, SBOM-tracked codebase from the first prompt. Not a replacement for your GRC stack, but a strong addition that generates its own compliance evidence.

License

MIT

Author

Raymond Andrew Rizzo

Related MCP servers

Browse all →