ProofFlow
Agent Work Ledger for AI coding.
Vibe coding is fast. Blind trust is not enough.
ProofFlow makes AI-generated work reviewable, traceable, and reversible by recording the full chain from work contract to proof packet: contract first, record the algorithm decision, declare the cost budget, snapshot the code state, bind claims to evidence, evaluate done criteria, then export an auditable packet.
Latest release: v0.1.8 - Agent Work Ledger for AI coding
▶ Watch the 72s demo: From AI agent claims to verifiable Proof Packets<br>
Demo asset (deferred): The end-to-end dogfood Demo_Asset GIF and the VSCode_Channel inline audit / Approve Gate screenshots for the v0.1.x dogfood-and-channel-polish milestone are deferred to the next dogfood cycle (no capturable VS Code window in this milestone). Tracked in
PLANS.md#vscode-channel-screenshots-deferred-from-v0-1-x-dogfood.
📦 Example Proof Packets: code review · issue triage · agent work ledger · ledger dogfood
Maintainer workflow: docs/maintainer_evidence_workflow.md
Agent Work Ledger guide: docs/agent_work_ledger.md
Ledger Risk Hints: docs/ledger_risk_hints.md
5-minute MCP quickstart: docs/ledger_quickstart_mcp.md
Ledger PR comment template: docs/examples/pr_comment_agent_work_ledger.md
AgentGuard semantic rules: docs/agentguard_semantic_rules.md

Agent Work Ledger
ProofFlow is not only a PR review helper. It is a local-first ledger for AI coding work. A Ledger Case captures the workflow before, during, and after an agent changes code:
- Work Contract - record the objective, repo path, allowed scope,
forbidden actions, required tests, done criteria, evidence requirements, algorithm requirements, and cost budget.
- Algorithm Decision - record the selected approach, rationale,
alternatives, invariants, and forbidden approaches before implementation.
- Cost Budget - declare token, API, GPU, CPU, runtime, or iteration limits
before expensive work begins.
- Snapshot - capture the git diff, changed files, HEAD SHA, base ref, and
diff hash so reviewers know exactly what code state was examined.
- Evidence - store command output, test output, diffs, notes, screenshots,
or other artifacts as searchable evidence.
- Claim - require every agent claim to bind to evidence before it is
trusted.
- Evaluation - deterministically check required tests, algorithm decision,
cost budget, scope boundaries, missing evidence, unaccepted risks, and non-blocking Risk Hints for suspicious routes.
- Proof Packet - export the contract, algorithm decision, cost budget,
timeline, snapshots, claims, evidence, evaluation, decisions, and remaining risks into markdown.
Main chain: Work Contract -> Algorithm Decision -> Cost Budget -> Snapshot -> Evidence -> Claim -> Evaluation -> Proof Packet. This keeps the core product invariant sharp: no Case, no workflow; no Evidence, no trusted Claim; no done criteria evaluation, no quiet success.
Risk Hints extend the evidence flow without turning ProofFlow into an automatic algorithm judge. They tell the maintainer when the recorded route may be wrong or too expensive, such as regeneration where mapping was required, budget overrun metadata, or tests that prove output but not method.
See docs/agent_work_ledger.md for the full architecture and evaluation model, or docs/ledger_quickstart_mcp.md to run the full MCP flow.
ProofFlow Reviewed ProofFlow
ProofFlow v0.1.6 was dogfooded on a real repository PR. The GitHub Actions workflow ran AgentGuard, posted a stable PR summary comment, uploaded summary.json, and exported a downloadable Proof Packet.

- Real PR: #94 Dogfood v0.1.6 CI review story
- Review run: ProofFlow PR Review #25953071865
- Patch release from dogfood feedback: v0.1.6.1
- Result: one stable comment updated across pushes, one artifact containing the
Proof Packet and summary.json, no merge blocking.
     
Problem
AI coding agents (Claude Code, Codex, Copilot Workspace) can modify files, run commands, and make decisions autonomously. But there's no standard way to:
- Audit what an agent did and why
- Gate high-risk actions before they execute
- Prove that a code review actually checked what it claims
- Undo agent-initiated changes with confidence
ProofFlow solves this by sitting between the agent and the filesystem, creating an evidence graph that links every action to its justification.
Quickstart
Docker (recommended)
Run from the parent directory of the freshly cloned repo. The Push-Location / Pop-Location pair keeps the working directory at the repository root for the docker compose up command and restores it after the block, so this snippet is copy-paste safe in a single PowerShell session.
git clone https://github.com/Hyperion-GPU/ProofFlow-v0.1.git
Push-Location ProofFlow-v0.1
docker compose up
Pop-Location
Backend: http://localhost:8787 | Frontend: http://localhost:5173
Docker publishes both ports on 127.0.0.1 by default to preserve ProofFlow's localhost trust boundary. For stronger local protection, set an API key before starting:
PROOFFLOW_API_KEY=change-me docker compose up
If you enable backend auth for the Docker frontend, use the same PROOFFLOW_API_KEY value at build time so Vite can embed VITE_PROOFFLOW_API_KEY in the static frontend bundle. AgentGuard test_command execution is disabled by default; set PROOFFLOW_ENABLE_TEST_COMMANDS=true only when you intentionally want the backend to run local test commands during review.
Manual
Start each component from the repository root in a single PowerShell session. Push-Location / Pop-Location keeps the working directory predictable across the backend and frontend blocks; the backend port is fixed to 8787 to match the make dev-backend baseline. npm run dev is a long-running process - run the frontend block in a second PowerShell session if you want to keep the backend uvicorn process visible in the first.
# Backend
Push-Location backend
pip install -r requirements.txt
python -m uvicorn proofflow.main:app --port 8787
Pop-Location
# Frontend (long-running; recommended in a second PowerShell session)
Push-Location frontend
npm ci
npm run dev
Pop-Location
MCP Integration (Claude Code / Codex)
pip install proofflow-mcp
Add to your project's .mcp.json:
{
"mcpServers": {
"proofflow": {
"command": "proofflow-mcp",
"env": { "PROOFFLOW_BASE_URL": "http://127.0.0.1:8787" }
}
}
}
Now your AI agent can keep an Agent Work Ledger, scan files, review code, triage issues, suggest actions, and export audit reports - all with enforced safety gates.
Codex Maintainer Plugin
ProofFlow also includes a repo-local Codex plugin at plugins/proofflow-maintainer. It provides starter prompts and a maintainer-focused skill for:
- reviewing the current diff with ProofFlow,
- creating a Proof Packet for a PR,
- triaging issue text into a ProofFlow Case,
- keeping an Agent Work Ledger for complex code tasks.
The plugin uses the same local proofflow-mcp server and keeps the backend trust boundary at http://127.0.0.1:8787. See the public-safe Agent Work Ledger example for the expected handoff shape.
Architecture
AI Agent (Claude Code / Codex / Custom)
|
| MCP Protocol (stdio)
v
ProofFlow MCP Server (23 tools)
|
| HTTP REST API
v
ProofFlow Backend (FastAPI + SQLite)
|
|--- Agent Work Ledger: Contract > Algorithm > Budget > Snapshot > Evidence > Claim > Evaluation > Packet
|--- Evidence Graph: Cases > Artifacts > Claims > Evidence
|--- Action Pipeline: Preview > Approve > Execute > Undo
|--- Policy Gates: Risk classification > Owner decision
|--- Proof Packets: Exportable markdown audit reports
v
Local Filesystem (scanned files, git repos)
Core Capabilities
Agent Work Ledger
Records complex AI coding work as a first-class Case. The main flow is Work Contract -> Algorithm Decision -> Cost Budget -> Snapshot -> Evidence -> Claim -> Evaluation -> Proof Packet, so maintainers can see what the agent promised, what approach it chose, what cost limits it accepted, what changed, what evidence backs its claims, whether the done criteria were satisfied, and which Risk Hints deserve human review.
Evidence-Backed Code Review (AgentGuard)
Analyzes git diffs, generates risk-scored claims, and links each claim to specific evidence (changed lines, test results). No claim exists without supporting evidence.
File Audit & Organization (LocalProof)
Scans directories, indexes files with SHA-256 hashes, extracts text for full-text search, and suggests organization actions — all tracked in an auditable Case.
Issue Triage
Captures issue text as a first-class Case with source Artifact, deterministic triage Claims, component inference, label suggestions, and Proof Packet export.
Policy Gate Enforcement
High-risk filesystem actions (moves to system paths, bulk operations) are automatically paused at pending_decision status. Requires explicit owner approval before execution.
Safety Invariants
- No Contract, no Ledger - AI coding work starts with explicit scope and done criteria
- No Algorithm Decision, no trusted strategy - important approaches must be recorded before implementation
- No Cost Budget, no expensive workflow - costly operations need declared limits first
- No Final Snapshot, no Finish - finished ledgers must prove the reviewed repo state
- No Preview, no Action — destructive operations require two-phase confirmation
- No Evidence, no Claim — every assertion links to verifiable data
- No Ready Evaluation, no Quiet Success - failed ledgers finish as
finished_with_risks - No Undo, no Destructive Action — executed actions carry rollback metadata
- No Case, no Workflow — all work is tracked in auditable containers
MCP Tool Suite (23 tools)
health · scan · suggest · review · triage_issue · start_work_contract · record_algorithm_decision · record_cost_budget · capture_snapshot · record_evidence · record_claim · evaluate_contract · finish_work_ledger · status · approve_execute · export_packet · search · list_cases · list_actions · undo · decide
explain_risk_hint records an evidence-backed Decision for a Ledger Risk Hint without suppressing the hint.
Technical Stack
| Layer | Technology | Tests | |-------|-----------|-------| | Backend | Python 3.12, FastAPI, SQLite | 311 | | Frontend | React 19, TypeScript, Vite | 25 | | MCP Server | Python, MCP SDK, httpx | 44 | | CI | GitHub Actions (PR review + release gates) | Audit artifact + PR comment |
Security Features
- Optional API key authentication (
PROOFFLOW_API_KEY) - Rate limiting (
PROOFFLOW_RATE_LIMIT) - AgentGuard test command execution is opt-in (
PROOFFLOW_ENABLE_TEST_COMMANDS) - MCP concurrency guards (
PROOFFLOW_MCP_MAX_CONCURRENT) - Filesystem action scope restrictions (allowed_roots)
- CORS locked to localhost origins
Project Status
v0.1.0 — Stable release. All core workflows functional, tested, and documented.
| Milestone | Status | |-----------|--------| | Core evidence graph (Case/Artifact/Claim/Evidence) | Done | | LocalProof file audit workflow | Done | | AgentGuard code review workflow | Done | | Issue triage workflow | Done | | Policy gate enforcement | Done | | MCP server for Claude Code/Codex | Done | | Backup/restore with safety preview | Done | | Docker deployment | Done | | PyPI package (proofflow-mcp) | Done |
Roadmap
- [ ] Multi-agent coordination (shared Cases across agents)
- [ ] Vector RAG for semantic evidence retrieval
- [x] GitHub Actions integration (CI-triggered reviews)
- [x] VS Code extension (Marketplace)
- [ ] Cloud sync option for team workflows
- [ ] Webhook notifications for policy gate decisions
Development
Run from the repository root in a single PowerShell session. Each Push-Location / Pop-Location block restores the working directory back to the repository root, so the python scripts/... smoke tests and scripts/demo_workflow.py below can be pasted in the same session.
# Run all tests
Push-Location backend
python -m pytest # 311 tests
Pop-Location
Push-Location frontend
npm run test # 29 tests
Pop-Location
Push-Location mcp-server
pip install -e ".[dev]"
python -m pytest # 44 tests
Pop-Location
# End-to-end smoke test (cwd: repository root)
python scripts/mcp_smoke.py --cleanup
python scripts/ledger_mcp_smoke.py --cleanup
python scripts/ledger_risk_hints_smoke.py --cleanup
python scripts/ledger_risk_hints_dogfood_matrix.py --cleanup
# Demo workflow (cwd: repository root)
python scripts/demo_workflow.py
Local backend data defaults to backend/data/. For dogfood runs that should not touch repository-local state, set PROOFFLOW_DB_PATH and PROOFFLOW_DATA_DIR to a temporary directory before starting the backend.
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
License
MIT
---
Built by Hyperion-GPU — making AI agent workflows auditable, safe, and provable.






