S.C.R.U.B.
Source Code Review, Uplift, and Baselining
A 22-tool MCP server and CI-ready CLI that cuts cloud LLM token usage on code quality tasks. Deterministic tools handle what they can. A local LLM (via DSPy) handles the rest. Cloud models plan and review. Nothing else.
Cloud LLM (plan) ──> S.C.R.U.B. MCP Server ──> Cloud LLM (review)
│
┌───────────┼───────────┐
▼ ▼ ▼
Deterministic DSPy + Security +
(Ruff, AST, Local LLM Supply Chain
pyright) (Qwen Coder) (Bandit, OSV)
┌───────────┴───────────┐
▼ ▼
CLI (scrub) GitHub Actions
check│fix│diff│audit .pre-commit-hooks
SARIF 2.1.0 output
Why: From Code Generation to Code Governance
The AI industry is obsessed with Day 1: writing code faster. But for any team responsible for enterprise infrastructure, the real cost of software isn't writing it—it's Day 2: maintaining, securing, and auditing that code for the next five years.
AI makes it dangerously easy to generate massive amounts of technical debt, complex spaghetti logic, and unvetted supply chain risks. It writes code, but it doesn't own systems.
S.C.R.U.B. is the adult in the room. It is not a coding tool; it is a best practices tool. It acts as a deterministic governance engine that sits between the AI and your codebase, strictly enforcing sanity, security thresholds, and architectural standards. It allows teams to safely adopt autonomous coding agents because the output is systematically proven to be compliant, legible, and secure before it ever hits a pull request.
The fact that S.C.R.U.B. forces CycloneDX SBOMs, Bandit security scans, and strict cyclomatic complexity limits isn't just a side feature—it's the product. The conversation shifts from "How fast can the AI code?" to "How safe is the code the AI just wrote?"
This deterministic-first approach has a powerful side effect: it dramatically cuts cloud LLM costs. Boilerplate tasks like docstrings, type annotations, and linting are handled locally, where compute is virtually free and quality is consistent. Your cloud models are reserved for high-level planning and review, not churning out commodity code.
Architecture
Deterministic-first. Every task hits deterministic tools before the LLM sees it. Ruff handles linting. pyright validates types. pydocstyle checks docstring style. AST analysis computes complexity. Bandit scans for vulnerabilities. If the deterministic tool says the code already passes, the LLM never fires. Zero tokens spent.
Three-tier pre-filter. Tier 1 (AST): is the docstring/annotation physically present? Tier 2 (pydocstyle/pyright): does the existing artifact pass quality checks? Tier 3 (only failures): send to the local LLM. Each tier is gated to its step. Ask for --steps lint and no pre-filter runs for docstrings.
Batched DSPy calls. Instead of one LLM call per function, S.C.R.U.B. packs 5 functions into a single prompt (configurable batch_size). A file with 30 functions goes from 60 round trips to 12.
Teacher-student optimization. Use Claude (or any large model) as a teacher to compile optimized prompts for your local student model. One-time cloud cost during setup. Zero cloud cost at runtime. The compiled prompts live in .dspy_cache/ and the health checker flags when they go stale.
Structural laziness enforcement. Tool descriptions are written as imperatives ("MUST use this tool", "Do NOT write manually") so the cloud LLM routes through S.C.R.U.B. instead of generating boilerplate itself. AGENTS.md establishes the division of labor before the first prompt.
The 22 Tools
Code Hygiene
| Tool | LLM? | What it does | |------|------|-------------| | lint_file | No | Ruff lint + autofix | | generate_docstrings | Batched | Google-style, all files/classes/methods/functions | | annotate_types | Batched | All args + all returns, no exceptions | | add_comments | Gated | Only fires on complex functions (cyclomatic/cognitive threshold) | | hygiene_full | Mixed | Full pipeline: lint, docstrings, types, comments | | hygiene_batch | Mixed | hygiene_full across multiple files in parallel | | hygiene_incremental | Mixed | Diff-aware, cache-accelerated hygiene (skips unchanged functions) |
Coding Tools
| Tool | LLM? | What it does | |------|------|-------------| | analyze_complexity | No | Cyclomatic + cognitive complexity, nesting depth, hotspots | | suggest_simplifications | Yes | Concrete refactoring: early returns, guard clauses, extract function | | optimize_imports | Mixed | Ruff removes unused, DSPy infers missing imports | | generate_tests | Batched | pytest stubs: happy path, edge cases, parametrize | | run_tests | No | Run pytest with PYTHONPATH=src, returns exit code + output | | find_dead_code | No | Unreachable code, unused vars, redundant else, commented blocks | | suggest_refactoring | Yes | Extract function candidates, rename suggestions |
Exploration
| Tool | LLM? | What it does | |------|------|-------------| | explore_architecture | No | AST skeleton: signatures + docstrings, bodies replaced with ... | | read_files | No | Batch-read multiple files in one call | | find_symbols | No | Extract function/class signatures via AST | | grep_multi | No | Multi-pattern regex search across the codebase |
Security + Supply Chain
| Tool | LLM? | What it does | |------|------|-------------| | security_scan | No | Bandit static analysis: hardcoded secrets, injection, weak crypto | | security_remediate | Yes | Triage-first: rewrite, # nosec with justification, or accept risk | | generate_sbom | No | CycloneDX 1.5 / SPDX 2.3 from pip, pyproject, lock files | | scan_vulnerabilities | No | Cross-reference PURLs against OSV.dev (PyPI, GHSA, NVD) |
Quick Start
Install
# From GitHub
pip install "scrub-mcp @ git+https://github.com/zombat/scrub-mcp.git"
pip install "scrub-mcp[all] @ git+https://github.com/zombat/scrub-mcp.git"
# From PyPI (when published)
pip install scrub-mcp
pip install scrub-mcp[all] # everything
# Extras
pip install scrub-mcp[security] # adds Bandit
pip install scrub-mcp[prefilter] # adds pyright + pydocstyle
CLI
S.C.R.U.B. ships a scrub CLI for CI pipelines, pre-commit hooks, and local use. No LLM required for check and audit.
# Detect violations (deterministic, no LLM)
scrub check src/ --fail-on missing-docstrings,missing-types,complexity:10,security:MEDIUM
# SARIF output for GitHub Code Scanning
scrub check src/ --fail-on missing-docstrings --format sarif --output results.sarif
# Diff-aware (only files changed since main)
scrub check . --since main --fail-on missing-docstrings,missing-types
# Auto-fix with commit (requires local LLM)
scrub fix src/ --steps docstrings types --commit
# Preview what fix would change
scrub diff src/ --format unified
# Security + supply-chain audit
scrub audit . --fail-on-severity HIGH --sbom-format cyclonedx --output-sbom sbom.json
# Cache management
scrub cache stats
scrub cache clear --stale
scrub cache warm src/
scrub check exit codes
| Code | Meaning | |------|---------| | 0 | Clean (or --fail-on not specified) | | 1 | Violations found matching --fail-on criteria |
--fail-on options
| Token | What it checks | |-------|---------------| | missing-docstrings | Functions, classes, and modules without docstrings | | missing-types | Functions without type annotations | | complexity:N | Functions with cyclomatic complexity >= N | | security:SEVERITY | Bandit findings at or above severity (low, medium, high) | | vulns:SEVERITY | OSV.dev vulnerabilities at or above severity (low, medium, high, critical) |
Output formats
All commands that produce reports support --format text|json|sarif. SARIF 2.1.0 output maps directly to GitHub Code Scanning — each violation becomes a SARIF result with location, message, and rule ID (e.g. SCRUB-DOC-001, SCRUB-SEC-B101, SCRUB-VULN-CVE-2024-1234).
GitHub Actions
Add S.C.R.U.B. as a quality gate in your CI pipeline:
# .github/workflows/scrub.yml
name: Code Quality
on: [push, pull_request]
jobs:
scrub:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: zombat/scrub-mcp@v1
with:
mode: check
fail-on: "missing-docstrings,missing-types,security:MEDIUM"
format: sarif
The composite action installs S.C.R.U.B., runs the selected mode, and uploads SARIF to GitHub Code Scanning automatically. Supported modes: check, fix, diff, audit.
Pre-commit Hook
# .pre-commit-config.yaml
repos:
- repo: https://github.com/zombat/scrub-mcp
rev: v1
hooks:
- id: scrub-check # docstrings + types
- id: scrub-security # Bandit at MEDIUM severity
Fast because diff-aware (--since HEAD) combined with the artifact cache means only staged changes are analyzed.
LLM Considerations
When using Ollama, how a model fits into your memory determines how it performs:
- Fully in VRAM: Blazing fast (30–50+ tokens/second). Perfect for real-time chat and IDE tab-autocomplete.
- Split between GPU and System RAM: Noticeably slower (2–10 tokens/second), as your CPU has to process the overflow. Great for complex problems where you don't mind waiting a few seconds.
*1. The Sweet Spot: Best Overall (Fits entirely in VRAM)***
If you want a model that is smart, handles complex logic, and replies instantly, you want a 7B–8B parameter model.
- Top Pick: Qwen 2.5 Coder (7B)
- Why: It is currently the undisputed heavyweight champion of small coding models. It punches way above its weight, routinely beating many older 30B+ models on benchmarks. At 4.7GB (with Ollama's default quantization), it fits entirely in a 4060 8GB GPU while leaving over 3GB of VRAM open for a massive context window (perfect for pasting in multiple large files).
- Command: ollama run qwen2.5-coder:7b
- Runner Up: Llama 3.1 (8B)
- Why: A fantastic generalist. If you want a model that writes good code but is also excellent at writing documentation, drafting emails, or brainstorming architecture, this is highly reliable.
- Command: ollama run llama3.1
*2. The Autocomplete Champs (For IDE Integration)***
If you are using an IDE extension (like Continue.dev, Cline, or Roo) for background "tab-autocomplete" as you type, you need a model that can generate tokens in milliseconds. The 7B models can sometimes be just a fraction of a second too slow for a seamless typing experience.
- Top Pick: Qwen 2.5 Coder (1.5B or 3B)
- Why: It takes up barely any VRAM (~1GB to 2GB), leaving your GPU free for your OS, browser, or running a secondary chat model simultaneously. It is trained specifically for Fill-In-the-Middle (FIM) tasks, making it lightning-fast for predicting the next few lines of code.
- Command: ollama run qwen2.5-coder:1.5b
*3. The Heavy Lifters (64GB+ System RAM)***
If you have 64GB+ of system RAM, you can comfortably run massive, frontier-level models. Ollama will automatically load as much of the model as possible to the GPU and seamlessly offload the rest to your CPU. It will be slower, but it will be incredibly smart.
- Top Pick: Qwen 2.5 Coder (32B)
- Why: This model rivals proprietary cloud models like GPT-4o and Claude 3.5 Sonnet on coding benchmarks. It's a ~20GB file, so it will lean heavily on your system RAM. Pull this out when you have a difficult architectural problem or an obscure bug, and are willing to wait 10–20 seconds for the code generation to finish.
- Command: ollama run qwen2.5-coder:32b
- Runner Up: DeepSeek R1 (14B)
- Why: This is a "reasoning" model (similar to OpenAI's o1). It will generate an internal chain-of-thought to "think" through complex logic problems before outputting the final code. At 14B, it will only spill over your VRAM slightly (taking ~9GB total), meaning your CPU won't bottleneck it too severely.
- Command: ollama run deepseek-r1:14b
Configure
# config.yaml
model:
provider: ollama
model: qwen2.5-coder:14b
base_url: http://localhost:11434
max_tokens: 4096
temperature: 0.1
batch_size: 5
deterministic_prefilter: true
Connect to MCP Clients
The --install-mcp flag auto-generates or updates the correct MCP config file for your client:
# Claude Code — writes .mcp.json in the project root
python -m scrub_mcp.mcp.server --install-mcp claude-code
# Cursor — writes .cursor/mcp.json in the project root
python -m scrub_mcp.mcp.server --install-mcp cursor
# GitHub Copilot (VS Code) — writes .vscode/mcp.json
python -m scrub_mcp.mcp.server --install-mcp github-copilot
# Copilot CLI — writes ~/.copilot/mcp-config.json (global)
python -m scrub_mcp.mcp.server --install-mcp copilot-cli
# Gemini Code Assist — writes ~/.gemini/settings.json (global)
python -m scrub_mcp.mcp.server --install-mcp gemini
# Windsurf (Cascade) — writes ~/.codeium/windsurf/mcp_config.json (global)
python -m scrub_mcp.mcp.server --install-mcp windsurf
# Cline / Roo Code — writes .vscode/mcp.json
python -m scrub_mcp.mcp.server --install-mcp cline
# Zed — writes ~/.config/zed/settings.json (global)
python -m scrub_mcp.mcp.server --install-mcp zed
Each command detects the running Python interpreter and merges the S.C.R.U.B. server entry into any existing config without overwriting other servers.
Or configure manually — S.C.R.U.B. speaks MCP over stdio, so any MCP client works:
{
"mcpServers": {
"scrub": {
"command": "python",
"args": ["-m", "scrub_mcp.mcp.server"]
}
}
}
Note: Zed uses a different schema (
context_serverswith a nestedcommandobject). Use--install-mcp zedto generate the correct format automatically.
Drop agent instructions
Tell the cloud LLM to orchestrate via tools instead of writing boilerplate itself:
python -m scrub_mcp.mcp.server --agent-instructions .
Writes AGENTS.md into the current directory with a prime directive, division-of-labor table, mandatory workflow checklists (hygiene before every file, security before every commit), and a bouncer rule that intercepts manual docstring/type/test generation. Commit it alongside .mcp.json.
Optimization
Training examples ship with the package (scrub_mcp/examples/). The optimizer runs out of the box — no --examples-dir required.
Self-teach (local only, free)
python -m scrub_mcp.optimizers.tune
Teacher-student (one-time cloud cost, better results)
python -m scrub_mcp.optimizers.tune --teacher --teacher-model claude-sonnet-4-20250514
Build + tune in one shot
Generate fresh training examples using Claude, then immediately optimize on them:
export ANTHROPIC_API_KEY=sk-ant-...
# Offline student after building
python -m scrub_mcp.optimizers.tune --build-examples ./examples --build-count 10
# Teacher-student after building
python -m scrub_mcp.optimizers.tune --build-examples ./examples --build-count 10 \
--teacher --teacher-model claude-sonnet-4-20250514
Each topic generates three files: {topic}_messy.py (undocumented input), {topic}_clean.py (fully annotated ground truth), and {topic}_test.py (pytest tests). Commit them to src/scrub_mcp/examples/ to bundle with the package.
Per-module strategy
| Module | Strategy | Rationale | |--------|----------|-----------| | docstrings, types, comments | BootstrapFewShot | Style matching, Qwen handles it well | | imports, dead_code | BootstrapFewShotWithRandomSearch | Needs variety in examples | | complexity, tests, refactoring | MIPROv2 | Prompt wording drives output quality |
Calibrated LLM-as-judge
MIPROv2 modules use a two-tier metric: 40% structural checks (JSON valid, required fields, style markers) + 60% teacher-as-judge (Claude evaluates Qwen's output against calibrated rubrics with grounded score anchors). Edit .dspy_cache/judge_calibration.json to tune the judge to your standards.
Cache health
# Quick check
python -m scrub_mcp.optimizers.health
# CI gate (non-zero exit if stale)
python -m scrub_mcp.optimizers.health --threshold 0.7
# Just the heavy modules
python -m scrub_mcp.optimizers.health --modules complexity,tests
Model fingerprints detect drift when you upgrade Qwen. The health checker tells you exactly which modules to recompile.
Project Structure
action.yml # GitHub Actions composite action
.pre-commit-hooks.yaml # Pre-commit hook definitions (scrub-check, scrub-security)
config.yaml # Default pipeline configuration
src/scrub_mcp/
cli.py # CLI entrypoint: check, fix, diff, audit, cache
config.py # Pydantic config: model, ruff, optimizer, batch, prefilter
models.py # All I/O models (Pydantic)
pipeline.py # Orchestrator: deterministic-first, batched, gated
examples/ # Bundled training examples (ships with pip install)
{topic}_messy.py # Undocumented input
{topic}_clean.py # Fully annotated ground truth
{topic}_test.py # pytest tests
mcp/
server.py # 22-tool MCP server (stdio) + --agent-instructions
modules/
signatures.py # DSPy signatures: hygiene tasks
hygiene.py # DSPy modules: docstrings, types, comments (single + batch)
coding_signatures.py # DSPy signatures: coding + security tasks
coding_tools.py # DSPy modules: complexity, tests, refactoring, security triage
tools/
parser.py # AST extraction: functions, classes, modules, complexity
linter.py # Ruff wrapper
rewriter.py # Source rewriter: apply docstrings, types back to source
sarif.py # SARIF 2.1.0 serializer for check/audit output
cache.py # 3-layer composite hash artifact cache
diff.py # Unified diff parser + function narrowing
fs.py # Gitignore-aware file traversal
complexity.py # Cyclomatic + cognitive complexity analyzer
dead_code.py # Unreachable code, unused vars, commented blocks
imports.py # Import optimizer (Ruff + AST)
security.py # Bandit wrapper
sbom.py # CycloneDX 1.5 / SPDX 2.3 generator
vulnscan.py # OSV.dev vulnerability scanner
savings.py # Cloud cost savings estimator
optimizers/
tune.py # Per-module optimizer: teacher-student, calibrated judge
health.py # Cache staleness detection + model fingerprinting
examples_gen.py # Training example generator (Claude API)
scripts/
gen_examples.py # CLI wrapper around examples_gen (dev convenience)
The ROI
By inserting S.C.R.U.B. into the Claude Code (or any MCP client) workflow, you cut cloud token usage on every generation loop. The local pipeline doesn't get lazy on function 47. It doesn't skip the boring parts to save tokens. It runs the same quality pass on every function, every class, every file.
The CLI layer (scrub check, scrub audit) enforces the same standards in CI without needing an LLM at all. SARIF output feeds directly into GitHub Code Scanning, so violations show up as inline annotations on pull requests. Pre-commit hooks catch issues before they reach the pipeline.
What comes out the other side isn't generic boilerplate. It's a fully documented, typed, linted, security-scanned, SBOM-tracked codebase from the first prompt. Not a replacement for your GRC stack, but a strong addition that generates its own compliance evidence.
License
MIT
Author
Raymond Andrew Rizzo






