ProofFlow

Agent Work Ledger for AI coding.

Vibe coding is fast. Blind trust is not enough.

ProofFlow makes AI-generated work reviewable, traceable, and reversible by recording the full chain from work contract to proof packet: contract first, record the algorithm decision, declare the cost budget, snapshot the code state, bind claims to evidence, evaluate done criteria, then export an auditable packet.

Latest release: v0.1.8 - Agent Work Ledger for AI coding

▶ Watch the 72s demo: From AI agent claims to verifiable Proof Packets<br>

Demo asset (deferred): The end-to-end dogfood Demo_Asset GIF and the VSCode_Channel inline audit / Approve Gate screenshots for the v0.1.x dogfood-and-channel-polish milestone are deferred to the next dogfood cycle (no capturable VS Code window in this milestone). Tracked in PLANS.md#vscode-channel-screenshots-deferred-from-v0-1-x-dogfood.

📦 Example Proof Packets: code review · issue triage · agent work ledger · ledger dogfood

Maintainer workflow: docs/maintainer_evidence_workflow.md

Agent Work Ledger guide: docs/agent_work_ledger.md

Ledger Risk Hints: docs/ledger_risk_hints.md

5-minute MCP quickstart: docs/ledger_quickstart_mcp.md

Ledger PR comment template: docs/examples/pr_comment_agent_work_ledger.md

AgentGuard semantic rules: docs/agentguard_semantic_rules.md

![ProofFlow demo thumbnail](https://github.com/Hyperion-GPU/ProofFlow-v0.1/releases/tag/v0.1.3)

Agent Work Ledger

ProofFlow is not only a PR review helper. It is a local-first ledger for AI coding work. A Ledger Case captures the workflow before, during, and after an agent changes code:

Work Contract - record the objective, repo path, allowed scope,

forbidden actions, required tests, done criteria, evidence requirements, algorithm requirements, and cost budget.

Algorithm Decision - record the selected approach, rationale,

alternatives, invariants, and forbidden approaches before implementation.

Cost Budget - declare token, API, GPU, CPU, runtime, or iteration limits

before expensive work begins.

Snapshot - capture the git diff, changed files, HEAD SHA, base ref, and

diff hash so reviewers know exactly what code state was examined.

Evidence - store command output, test output, diffs, notes, screenshots,

or other artifacts as searchable evidence.

Claim - require every agent claim to bind to evidence before it is

trusted.

Evaluation - deterministically check required tests, algorithm decision,

cost budget, scope boundaries, missing evidence, unaccepted risks, and non-blocking Risk Hints for suspicious routes.

Proof Packet - export the contract, algorithm decision, cost budget,

timeline, snapshots, claims, evidence, evaluation, decisions, and remaining risks into markdown.

Main chain: Work Contract -> Algorithm Decision -> Cost Budget -> Snapshot -> Evidence -> Claim -> Evaluation -> Proof Packet. This keeps the core product invariant sharp: no Case, no workflow; no Evidence, no trusted Claim; no done criteria evaluation, no quiet success.

Risk Hints extend the evidence flow without turning ProofFlow into an automatic algorithm judge. They tell the maintainer when the recorded route may be wrong or too expensive, such as regeneration where mapping was required, budget overrun metadata, or tests that prove output but not method.

See docs/agent_work_ledger.md for the full architecture and evaluation model, or docs/ledger_quickstart_mcp.md to run the full MCP flow.

ProofFlow Reviewed ProofFlow

ProofFlow v0.1.6 was dogfooded on a real repository PR. The GitHub Actions workflow ran AgentGuard, posted a stable PR summary comment, uploaded summary.json, and exported a downloadable Proof Packet.

![ProofFlow AgentGuard review comment for PR #94](https://github.com/Hyperion-GPU/ProofFlow-v0.1/pull/94#issuecomment-4465608299)

Real PR: #94 Dogfood v0.1.6 CI review story
Review run: ProofFlow PR Review #25953071865
Patch release from dogfood feedback: v0.1.6.1
Result: one stable comment updated across pushes, one artifact containing the

Proof Packet and summary.json, no merge blocking.

![Backend](https://github.com/Hyperion-GPU/ProofFlow-v0.1/actions/workflows/backend.yml) ![Frontend](https://github.com/Hyperion-GPU/ProofFlow-v0.1/actions/workflows/frontend.yml) ![MCP Server](https://github.com/Hyperion-GPU/ProofFlow-v0.1/actions/workflows/mcp-server.yml) ![VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=hyperion-gpu.proofflow) ![PyPI](https://pypi.org/project/proofflow-mcp/) ![License: MIT](https://opensource.org/licenses/MIT)

Problem

AI coding agents (Claude Code, Codex, Copilot Workspace) can modify files, run commands, and make decisions autonomously. But there's no standard way to:

Audit what an agent did and why
Gate high-risk actions before they execute
Prove that a code review actually checked what it claims
Undo agent-initiated changes with confidence

ProofFlow solves this by sitting between the agent and the filesystem, creating an evidence graph that links every action to its justification.

Quickstart

Docker (recommended)

Run from the parent directory of the freshly cloned repo. The Push-Location / Pop-Location pair keeps the working directory at the repository root for the docker compose up command and restores it after the block, so this snippet is copy-paste safe in a single PowerShell session.

git clone https://github.com/Hyperion-GPU/ProofFlow-v0.1.git
Push-Location ProofFlow-v0.1
docker compose up
Pop-Location

Backend: http://localhost:8787 | Frontend: http://localhost:5173

Docker publishes both ports on 127.0.0.1 by default to preserve ProofFlow's localhost trust boundary. For stronger local protection, set an API key before starting:

PROOFFLOW_API_KEY=change-me docker compose up

If you enable backend auth for the Docker frontend, use the same PROOFFLOW_API_KEY value at build time so Vite can embed VITE_PROOFFLOW_API_KEY in the static frontend bundle. AgentGuard test_command execution is disabled by default; set PROOFFLOW_ENABLE_TEST_COMMANDS=true only when you intentionally want the backend to run local test commands during review.

Manual

Start each component from the repository root in a single PowerShell session. Push-Location / Pop-Location keeps the working directory predictable across the backend and frontend blocks; the backend port is fixed to 8787 to match the make dev-backend baseline. npm run dev is a long-running process - run the frontend block in a second PowerShell session if you want to keep the backend uvicorn process visible in the first.

# Backend
Push-Location backend
pip install -r requirements.txt
python -m uvicorn proofflow.main:app --port 8787
Pop-Location

# Frontend (long-running; recommended in a second PowerShell session)
Push-Location frontend
npm ci
npm run dev
Pop-Location

MCP Integration (Claude Code / Codex)

pip install proofflow-mcp

Add to your project's .mcp.json:

{
  "mcpServers": {
    "proofflow": {
      "command": "proofflow-mcp",
      "env": { "PROOFFLOW_BASE_URL": "http://127.0.0.1:8787" }
    }
  }
}

Now your AI agent can keep an Agent Work Ledger, scan files, review code, triage issues, suggest actions, and export audit reports - all with enforced safety gates.

Codex Maintainer Plugin

ProofFlow also includes a repo-local Codex plugin at plugins/proofflow-maintainer. It provides starter prompts and a maintainer-focused skill for:

reviewing the current diff with ProofFlow,
creating a Proof Packet for a PR,
triaging issue text into a ProofFlow Case,
keeping an Agent Work Ledger for complex code tasks.

The plugin uses the same local proofflow-mcp server and keeps the backend trust boundary at http://127.0.0.1:8787. See the public-safe Agent Work Ledger example for the expected handoff shape.

Architecture

AI Agent (Claude Code / Codex / Custom)
    |
    | MCP Protocol (stdio)
    v
ProofFlow MCP Server (23 tools)
    |
    | HTTP REST API
    v
ProofFlow Backend (FastAPI + SQLite)
    |
    |--- Agent Work Ledger: Contract > Algorithm > Budget > Snapshot > Evidence > Claim > Evaluation > Packet
    |--- Evidence Graph: Cases > Artifacts > Claims > Evidence
    |--- Action Pipeline: Preview > Approve > Execute > Undo
    |--- Policy Gates: Risk classification > Owner decision
    |--- Proof Packets: Exportable markdown audit reports
    v
Local Filesystem (scanned files, git repos)

Core Capabilities

Agent Work Ledger

Records complex AI coding work as a first-class Case. The main flow is Work Contract -> Algorithm Decision -> Cost Budget -> Snapshot -> Evidence -> Claim -> Evaluation -> Proof Packet, so maintainers can see what the agent promised, what approach it chose, what cost limits it accepted, what changed, what evidence backs its claims, whether the done criteria were satisfied, and which Risk Hints deserve human review.

Evidence-Backed Code Review (AgentGuard)

Analyzes git diffs, generates risk-scored claims, and links each claim to specific evidence (changed lines, test results). No claim exists without supporting evidence.

File Audit & Organization (LocalProof)

Scans directories, indexes files with SHA-256 hashes, extracts text for full-text search, and suggests organization actions — all tracked in an auditable Case.

Issue Triage

Captures issue text as a first-class Case with source Artifact, deterministic triage Claims, component inference, label suggestions, and Proof Packet export.

Policy Gate Enforcement

High-risk filesystem actions (moves to system paths, bulk operations) are automatically paused at pending_decision status. Requires explicit owner approval before execution.

Safety Invariants

No Contract, no Ledger - AI coding work starts with explicit scope and done criteria
No Algorithm Decision, no trusted strategy - important approaches must be recorded before implementation
No Cost Budget, no expensive workflow - costly operations need declared limits first
No Final Snapshot, no Finish - finished ledgers must prove the reviewed repo state
No Preview, no Action — destructive operations require two-phase confirmation
No Evidence, no Claim — every assertion links to verifiable data
No Ready Evaluation, no Quiet Success - failed ledgers finish as finished_with_risks
No Undo, no Destructive Action — executed actions carry rollback metadata
No Case, no Workflow — all work is tracked in auditable containers

MCP Tool Suite (23 tools)

health · scan · suggest · review · triage_issue · start_work_contract · record_algorithm_decision · record_cost_budget · capture_snapshot · record_evidence · record_claim · evaluate_contract · finish_work_ledger · status · approve_execute · export_packet · search · list_cases · list_actions · undo · decide

explain_risk_hint records an evidence-backed Decision for a Ledger Risk Hint without suppressing the hint.

Technical Stack

| Layer | Technology | Tests | |-------|-----------|-------| | Backend | Python 3.12, FastAPI, SQLite | 311 | | Frontend | React 19, TypeScript, Vite | 25 | | MCP Server | Python, MCP SDK, httpx | 44 | | CI | GitHub Actions (PR review + release gates) | Audit artifact + PR comment |

Security Features

Optional API key authentication (PROOFFLOW_API_KEY)
Rate limiting (PROOFFLOW_RATE_LIMIT)
AgentGuard test command execution is opt-in (PROOFFLOW_ENABLE_TEST_COMMANDS)
MCP concurrency guards (PROOFFLOW_MCP_MAX_CONCURRENT)
Filesystem action scope restrictions (allowed_roots)
CORS locked to localhost origins

Project Status

v0.1.0 — Stable release. All core workflows functional, tested, and documented.

| Milestone | Status | |-----------|--------| | Core evidence graph (Case/Artifact/Claim/Evidence) | Done | | LocalProof file audit workflow | Done | | AgentGuard code review workflow | Done | | Issue triage workflow | Done | | Policy gate enforcement | Done | | MCP server for Claude Code/Codex | Done | | Backup/restore with safety preview | Done | | Docker deployment | Done | | PyPI package (proofflow-mcp) | Done |

Roadmap

[ ] Multi-agent coordination (shared Cases across agents)
[ ] Vector RAG for semantic evidence retrieval
[x] GitHub Actions integration (CI-triggered reviews)
[x] VS Code extension (Marketplace)
[ ] Cloud sync option for team workflows
[ ] Webhook notifications for policy gate decisions

Development

Run from the repository root in a single PowerShell session. Each Push-Location / Pop-Location block restores the working directory back to the repository root, so the python scripts/... smoke tests and scripts/demo_workflow.py below can be pasted in the same session.

# Run all tests
Push-Location backend
python -m pytest          # 311 tests
Pop-Location

Push-Location frontend
npm run test              # 29 tests
Pop-Location

Push-Location mcp-server
pip install -e ".[dev]"
python -m pytest          # 44 tests
Pop-Location

# End-to-end smoke test (cwd: repository root)
python scripts/mcp_smoke.py --cleanup
python scripts/ledger_mcp_smoke.py --cleanup
python scripts/ledger_risk_hints_smoke.py --cleanup
python scripts/ledger_risk_hints_dogfood_matrix.py --cleanup

# Demo workflow (cwd: repository root)
python scripts/demo_workflow.py

Local backend data defaults to backend/data/. For dogfood runs that should not touch repository-local state, set PROOFFLOW_DB_PATH and PROOFFLOW_DATA_DIR to a temporary directory before starting the backend.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT

---

Built by Hyperion-GPU — making AI agent workflows auditable, safe, and provable.

ProofFlow

ProofFlow

Agent Work Ledger

ProofFlow Reviewed ProofFlow

Problem

Quickstart

Docker (recommended)

Manual

MCP Integration (Claude Code / Codex)

Codex Maintainer Plugin

Architecture

Core Capabilities

Agent Work Ledger

Evidence-Backed Code Review (AgentGuard)

File Audit & Organization (LocalProof)

Issue Triage

Policy Gate Enforcement

Safety Invariants

MCP Tool Suite (23 tools)

Technical Stack

Security Features

Project Status

Roadmap

Development

Contributing

License

Related MCP servers

MCP servers by category