Code Review
Reviews code changes using dynamically selected reviewer personas. Spawns parallel sub-agents that return structured JSON, then merges and deduplicates findings into a single report.
When to Use
- Before creating a PR
- After completing a task during iterative implementation
- When feedback is needed on any code changes
- Can be invoked standalone
- Can run inside larger workflows; use
mode:agentwhen the caller needs JSON instead of markdown tables
Argument Parsing
Parse $ARGUMENTS for optional tokens. Strip each recognized token before interpreting the remainder as a PR number, GitHub URL, or branch name.
| Token | Example | Effect |
|---|---|---|
mode:agent | mode:agent | Report-only: return JSON instead of markdown tables and skip the Stage 5c apply (the caller applies). Does not change reviewer selection, merge logic, or scope rules (see Output format) |
mode:headless | mode:headless | Deprecated alias for mode:agent |
mode:report-only | mode:report-only | Deprecated — ignored. Former no-artifacts mode; default behavior is review-only without checkout |
base:<sha-or-ref> | base:abc1234 or base:origin/main | Diff base on the current checkout (explicit; skips auto base detection) |
plan:<path> | plan:docs/plans/2026-03-25-001-feat-foo-plan.md | Plan file for requirements verification (explicit) |
grouping:auto | grouping:auto | Default — build thematic triage groups when findings span distinct concerns (Stage 5 step 9b) |
grouping:off | grouping:off | Suppress triage groups: no Triage Groups section, empty triage_groups in JSON |
grouping:always | grouping:always | Always build triage groups, even for small reviews |
Grouping is presentation, not a mode. The grouping: tokens change how the finding set is organized for triage — never reviewer selection, merge logic, scope rules, or the Stage 5c apply decision.
Mode alias: mode:headless normalizes to mode:agent. mode:agent + mode:headless is not a conflict.
Conflicting arguments: Stop without dispatching reviewers when:
- Multiple incompatible scope selectors appear together (e.g.
base:and a PR number/branch target —base:means "review the current checkout against this base") - Multiple distinct
mode:tokens other than themode:agent/mode:headlessalias pair - Multiple distinct
grouping:tokens (e.g.grouping:offandgrouping:always)
Deprecated mode:autofix is not a conflict — ignore the token and proceed with the normal flow (see below).
Emit a one-line failure reason. In mode:agent, return JSON: {"status":"failed","reason":"..."}.
Operating principles
Same pipeline for default and mode:agent:
- Apply locally; never push. Never push, open PRs, or file tickets in any mode — push is the outward step the user owns. In default (interactive) mode the review applies safe, verified fixes and commits them when the pre-review tree was clean (Stage 5c owns the full rule). In
mode:agentit never mutates the tree — it reports and the caller applies. - No blocking prompts. Never use
AskUserQuestion,request_user_input,ask_user, or other blocking question tools. Infer intent, plan, and scope from explicit tokens, git state, PR metadata, and conversation. Note uncertainty in Coverage or the verdict — do not stop to ask. - Explicit mutations only. Never run
gh pr checkout,git checkout,git switch, or similar branch-switch commands. Passing a PR number, URL, or branch name selects review scope, not permission to mutate the working tree. To review local uncommitted work on a feature branch, check out that branch yourself (or stay on it) and passbase:or no target. - Smart defaults. Untracked files: review tracked changes only and list excluded paths in Coverage. Plan: use
plan:when passed; otherwise discover conservatively from PR body or branch keywords. Weak advisory P2/P3 from testing/maintainability alone: demote totesting_gaps/residual_risksper Stage 5.
Output format
| Invocation | Deliverable |
|---|---|
| Default | Markdown report (pipe-delimited finding tables) + Actionable Findings summary |
mode:agent | One JSON object (see ### JSON output format below) + the same /tmp/.../ce-code-review/<run-id>/ artifacts |
mode:agent is report-only: it skips the Stage 5c apply (the caller applies) and serializes findings as JSON instead of markdown. It does not change reviewer selection, merge logic, or scope rules — the JSON is the deterministic contract for programmatic and cross-harness callers (Codex, Gemini, etc.). The default markdown is the human view; keep it ASCII-safe (pipe tables, -> not middot ·, no box-drawing) so it degrades gracefully across terminals.
Quick Review Short-Circuit
If $ARGUMENTS indicates the user wants a quick, fast, or light code review — and mode:agent is not active — do not dispatch the multi-agent flow.
Announce the chosen path before any other work (Quick review vs Multi-agent review). Skip this announcement when mode:agent is active.
Sequence:
- Run the harness's built-in code review. Forward any review target after stripping tokens. Then stop — do not dispatch the multi-agent pipeline.
- Exemption: If no built-in review exists, continue into the full multi-agent review.
mode:agentbypasses this short-circuit — always run the full multi-agent review and return JSON.
Deprecated: mode:autofix is no longer supported — there is no apply mode. If passed, ignore the token and proceed with the normal flow (default applies safe fixes via Stage 5c; mode:agent reports and the caller applies).
Severity Scale
All reviewers use P0-P3:
| Level | Meaning | Action |
|---|---|---|
| P0 | Critical breakage, exploitable vulnerability, data loss/corruption | Must fix before merge |
| P1 | High-impact defect likely hit in normal usage, breaking contract | Should fix |
| P2 | Moderate issue with meaningful downside (edge case, perf regression, maintainability trap) | Fix if straightforward |
| P3 | Low-impact, narrow scope, minor improvement | User's discretion |
Action Routing
Severity answers urgency. autofix_class and owner are signal describing follow-up shape for callers — not apply permission or an apply gate. The apply decision is judgment (Stage 5c), not a function of autofix_class: default mode applies; in mode:agent this skill does not mutate the checkout — the caller applies. See references/action-class-rubric.md for persona guidance.
autofix_class | Default owner | Meaning |
|---|---|---|
gated_auto | downstream-resolver or human | Concrete suggested_fix proposed; caller applies after judgment |
manual | downstream-resolver or human | Actionable work needing design input or handoff |
advisory | human or release | Report-only — learnings, rollout notes, residual risk |
Routing rules:
- Synthesis owns the final route. Persona-provided routing metadata is input, not the last word.
- Choose the more conservative route on disagreement. A merged finding may move from
gated_autotomanual, but never widen without stronger evidence. - Reject
safe_autoandreview-fixerif present — drop the finding or remap togated_auto/downstream-resolverduring synthesis. requires_verification: truemeans any caller-applied fix needs targeted tests or follow-up validation.
Reviewers
14 reviewer personas in layered conditionals, plus CE agents. Quick roster with one-line triggers below; the persona catalog included at the bottom has the full per-persona selection criteria and spawn gates.
Always-on (every review): ce-correctness-reviewer, ce-testing-reviewer, ce-maintainability-reviewer, ce-project-standards-reviewer, plus CE agents ce-agent-native-reviewer and ce-learnings-researcher.
Cross-cutting conditional (per diff):
ce-security-reviewer— auth, public endpoints, user input, permissionsce-performance-reviewer— DB queries, data transforms, caching, asyncce-api-contract-reviewer— routes, serializers, type signatures, versioningce-data-migration-reviewer— migration files / schema dumps / backfills (see spawn gate in Stage 3)ce-reliability-reviewer— error handling, retries, timeouts, background jobsce-adversarial-reviewer— >=50 changed code lines, or auth / payments / data mutations / external APIsce-previous-comments-reviewer— PR with existing review comments (PR-only, comment-gated)
Stack-specific conditional (per diff): ce-julik-frontend-races-reviewer (Stimulus/Turbo, DOM events, async UI) and ce-swift-ios-reviewer (Swift/SwiftUI/UIKit, entitlements, Core Data, .pbxproj).
CE conditional (migration-specific): ce-deployment-verification-agent — deployment checklist + rollback when the migration gate applies and the change is risky.
Review Scope
Every review spawns all 4 always-on personas plus the 2 CE always-on agents, then adds whichever cross-cutting and stack-specific conditionals fit the diff. The model naturally right-sizes: a small config change triggers 0 conditionals = 6 reviewers. A Rails auth feature might trigger security + reliability + adversarial = 9 reviewers.
Protected Artifacts
The following paths are compound-engineering pipeline artifacts and must never be flagged for deletion, removal, or gitignore by any reviewer:
docs/brainstorms/*-- requirements documents created by ce-brainstormdocs/plans/*.md-- plan files created by ce-plan (decision artifacts; execution progress is derived from git, not stored in plan bodies)docs/solutions/*.md-- solution documents created during the pipeline
If a reviewer flags any file in these directories for cleanup or removal, discard that finding during synthesis.
How to Run
Stage 1: Determine scope
Compute the diff range, file list, and diff. Minimize permission prompts by combining into as few commands as possible.
If base: argument is provided (fast path):
The caller already knows the diff base. Skip all base-branch detection, remote resolution, and merge-base computation. Use the provided value directly:
BASE_ARG="{base_arg}"
BASE=$(git merge-base HEAD "$BASE_ARG" 2>/dev/null) || BASE="$BASE_ARG"
Then produce the same output as the other paths:
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard
This path works with any ref — a SHA, origin/main, a branch name. Callers reviewing the current checkout should pass explicit base: when auto-detection is unnecessary. Do not combine base: with a PR number or branch target. If both are present, stop with an error: "Cannot use base: with a PR number or branch target — base: implies the current checkout is already the correct branch. Pass base: alone, or pass the target alone and let scope detection resolve the base."
If a PR number or GitHub URL is provided as an argument:
Do not check out the PR branch. Scope comes from GitHub read APIs plus optional local alignment when HEAD already matches the PR head branch.
Skip-condition pre-check. Before scope detection, run a PR-state probe:
gh pr view <number-or-url> --json state,title,body,files
Apply skip rules in order:
stateisCLOSEDorMERGED-> stop with reasonPR is closed/merged; not reviewing.- Trivial-PR judgment: spawn a lightweight sub-agent (use
model: haikuin Claude Code; gpt-5.4-nano or equivalent in Codex) with the PR title, body, and changed file paths. The agent's task: "Is this an automated or trivial PR that does not warrant a code review? Consider: dependency lock-file or manifest-only bumps, automated release commits, chore version increments with no substantive code changes. When in doubt, answer no — false negatives (skipped reviews that should have run) are more costly than false positives (unnecessary reviews)." If the judgment returns yes: stop with reasonPR appears to be a trivial automated PR; not reviewing. Run without a PR argument to review the current branch, or pass base:<ref> if review is intended.
When any skip rule fires, stop without dispatching reviewers. Default mode: emit the reason as plain text. mode:agent: emit JSON only — {"status":"skipped","reason":"<same message>"} — so programmatic callers can parse the outcome. Standalone, base:, and branch-remote paths are unaffected. Draft PRs are reviewed normally.
If no skip rule fires, fetch PR metadata without checkout:
gh pr view <number-or-url> --json title,body,baseRefName,headRefName,headRefOid,isCrossRepository,url,files,reviews,comments --jq '{title, body, baseRefName, headRefName, headRefOid, isCrossRepository, url, files: [.files[].path], hasPriorComments: ((.reviews | map(select(.state != "APPROVED" or .body != "")) | length) > 0 or (.comments | length) > 0)}'
Set BASE: to pr:<number-or-url> (logical marker — not a git SHA). Set UNTRACKED: from git ls-files --others --exclude-standard on the current checkout (usually empty during PR-remote review).
PR scope mode. Classify as local-aligned only when all of these hold; otherwise use pr-remote. A matching branch name alone is not enough — a fork PR or a stale local branch can share a name with the PR head while pointing at unrelated code, and trusting the name would diff and inspect the wrong tree.
git rev-parse --abbrev-ref HEADequalsheadRefName.- The PR is not cross-repository (
isCrossRepositoryis false). - The PR head commit is contained in the local checkout:
git merge-base --is-ancestor <headRefOid> HEADexits 0. This confirms the working tree actually carries the PR head (allowing unpushed local fixes layered on top) rather than an unrelated same-named branch.
local-aligned— all three checks pass. Local Read/Grep/git blame against workspace files are valid for PR changed paths.pr-remote— any check fails. The working tree is not the PR head; workspace file contents for changed paths may be stale or unrelated.
Diff by scope mode (do not mix remote and local diffs — contradictory hunks cause false positives):
local-aligned: Resolve<resolved-base-ref>frombaseRefName(fetch if needed). ComputeBASE=$(git merge-base HEAD <resolved-base-ref>), then setFILES:fromgit diff --name-only $BASEandDIFF:fromgit diff -U10 $BASE(includes committed, staged, and unstaged changes on the PR branch). Do not callgh pr diffor append remote hunks — when unpushed fixes exist, the local tree is canonical. Note in Coverage:scope: local-aligned (PR; local tree diff).pr-remote: SetFILES:from the PRfilesarray. SetDIFF:fromgh pr diff <number-or-url> --color=never. Ifgh pr difffails, stop with an actionable error — do not fall back to checkout.
When pr-remote, before Stage 4:
- Best-effort fetch PR head without checkout:
git fetch --no-tags origin <headRefName>:refs/review/pr-<number>-head(substitute PR number from metadata). - When fetch succeeds, set
PR_HEAD_REF=refs/review/pr-<number>-headfor reviewers and validators. When fetch fails, omitPR_HEAD_REFand note in Coverage — reviewers must rely on diff hunks only. - Best-effort fetch the PR base without checkout:
git fetch --no-tags origin <baseRefName>. When it succeeds, resolve a concrete ref withgit rev-parse FETCH_HEADand setPR_BASE_REFto that SHA — a real git base ref reviewers and validators use for file-level git diffs (e.g.ce-data-migration-reviewerrunsgit diff <PR_BASE_REF> -- db/schema.rb/structure.sql). Thepr:<number-or-url>logical marker inBASE:stays the scope marker;PR_BASE_REFis the diffable base. When the fetch fails, omitPR_BASE_REFand note in Coverage — schema-drift and other git-diff checks fall back to diff hunks only and must not assumemain. - Include
<pr-scope-mode>pr-remote</pr-scope-mode>and, when set,<pr-head-ref>...</pr-head-ref>and<pr-base-ref>...</pr-base-ref>in the Stage 4 review context bundle.
Reviewers and Stage 5b validators in pr-remote mode must not Read/Grep workspace paths for files in FILES:. Inspect via git show <PR_HEAD_REF>:<path> when PR_HEAD_REF is set, otherwise use only the provided diff hunks. local-aligned uses normal workspace inspection.
If a branch name is provided as an argument:
Substitute the provided branch name as <branch>. Do not check out <branch>.
If git rev-parse --abbrev-ref HEAD equals <branch>, use the standalone (current branch) path below — same tree, explicit branch name; do not use remote-only diff.
Otherwise diff the remote/local ref without checkout:
- Try
gh pr view <branch> --json baseRefName,url,headRefName— if a PR exists, prefer the PR number/URL path above (same remote diff rules). - Else resolve
<branch>asorigin/<branch>or<branch>aftergit fetch --no-tags origin <branch>when needed. - Resolve default base branch (same logic as standalone). Compute
BASE=$(git merge-base <base-ref> <branch-ref>)andgit diff -U10 $BASE <branch-ref>. - If
<branch-ref>cannot be resolved locally, stop: "Cannot diff branch<branch>without checkout. Check out that branch, pass its open PR URL/number, or review the current branch withbase:."
On success for remote branch diff, set branch-remote scope. The working tree is not <branch>. Include <pr-scope-mode>branch-remote</pr-scope-mode> and <branch-head-ref><branch-ref></branch-head-ref> in the Stage 4 review context bundle. Reviewers and Stage 5b validators must not Read/Grep workspace paths for files in FILES:. Inspect via git show <branch-ref>:<path> or diff hunks only.
Produce:
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE <branch-ref> && echo "DIFF:" && git diff -U10 $BASE <branch-ref> && echo "UNTRACKED:" && git ls-files --others --exclude-standard
If no argument (standalone on current branch):
Apply the same base-detection logic as branch mode above, using the current branch (i.e., gh pr view --json baseRefName,url with no argument defaults to the current branch).
If no base can be resolved, stop. Do not fall back to git diff HEAD — a standalone review without the base would only show uncommitted changes and silently miss all committed work on the branch.
On success, produce the diff:
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard
Using git diff $BASE (without ..HEAD) diffs the merge-base against the working tree, which includes committed, staged, and unstaged changes together.
Untracked file handling: Always inspect UNTRACKED:. Untracked paths are out of scope unless staged. When non-empty, list excluded files in Coverage and continue on tracked changes only — never stop or prompt.
Stage 2: Intent discovery
Understand what the change is trying to accomplish. The source of intent depends on which Stage 1 path was taken:
PR/URL mode: Use the PR title, body, and linked issues from gh pr view metadata. Supplement with commit messages from the PR if the body is sparse.
Branch mode: Run git log --oneline ${BASE}..<branch-ref> using the resolved merge-base and resolved branch ref from Stage 1. Use <branch-ref> (the resolved origin/<branch> or fetched ref), not the raw <branch> argument — a remote-only branch has no matching local ref, so the raw name would fail or read a stale same-named local branch.
Standalone (current branch): Run:
echo "BRANCH:" && git rev-parse --abbrev-ref HEAD && echo "COMMITS:" && git log --oneline ${BASE}..HEAD
Combined with conversation context (plan section summary, PR description), write a 2-3 line intent summary:
Intent: Simplify tax calculation by replacing the multi-tier rate lookup
with a flat-rate computation. Must not regress edge cases in tax-exempt handling.
Pass this to every reviewer in their spawn prompt. Intent shapes how hard each reviewer looks, not which reviewers are selected.
When intent is ambiguous: Infer from branch name, commits, PR title/body, diff, plan:, and conversation. Write the best-effort intent summary and note uncertainty in Coverage — never block on a clarifying question.
Stage 2b: Plan discovery (requirements verification)
Locate the plan document so Stage 6 can verify requirements completeness. Check these sources in priority order — stop at the first hit:
plan:argument. If the caller passed a plan path, use it directly. Read the file to confirm it exists.- PR body. If PR metadata was fetched in Stage 1, scan the body for paths matching
docs/plans/*.md. If exactly one match is found and the file exists, use it asplan_source: explicit. If multiple plan paths appear, treat as ambiguous — demote toplan_source: inferredfor the most recent match that exists on disk, or skip if none exist or none clearly relate to the PR title/intent. Always verify the selected file exists before using it — stale or copied plan links in PR descriptions are common. - Auto-discover. Extract 2-3 keywords from the branch name (e.g.,
feat/onboarding-skill->onboarding,skill). Globdocs/plans/*and filter filenames containing those keywords. If exactly one match, use it. If multiple matches or the match looks ambiguous (e.g., generic keywords likereview,fix,updatethat could hit many plans), skip auto-discovery — a wrong plan is worse than no plan. If zero matches, skip.
Confidence tagging: Record how the plan was found:
plan:argument ->plan_source: explicit(high confidence)- Single unambiguous PR body match ->
plan_source: explicit(high confidence) - Multiple/ambiguous PR body matches ->
plan_source: inferred(lower confidence) - Auto-discover with single unambiguous match ->
plan_source: inferred(lower confidence)
If a plan is found, read its Requirements section — ## Requirements in current plans, ## Requirements Trace in legacy ones — and the R-IDs (R1, R2, etc.) listed there, plus Implementation Units (current numeric subsections such as ### U1., ### U2., or ### Unit 1: under ## Implementation Units; legacy bullet or checkbox unit entries under that section also count). Store the extracted requirements list and plan_source for Stage 6. Do not block the review if no plan is found — requirements verification is additive, not required.
Stage 3: Select reviewers
Read the diff and file list from Stage 1. The 4 always-on personas and 2 CE always-on agents are automatic. For each cross-cutting and stack-specific conditional persona in the persona catalog included below, decide whether the diff warrants it. This is agent judgment, not keyword matching.
File-type awareness for conditional selection: Instruction-prose files (Markdown skill definitions, JSON schemas, config files) are product code but do not benefit from runtime-focused reviewers. The adversarial reviewer's techniques (race conditions, cascade failures, abuse cases) target executable code behavior. For diffs that only change instruction-prose files, skip adversarial unless the prose describes auth, payment, or data-mutation behavior. Count only executable code lines toward line-count thresholds.
previous-comments is PR-only AND comment-gated. Only select this persona when both conditions hold:
- Stage 1 gathered PR metadata (PR number or URL was provided as an argument, or
gh pr viewreturned metadata for the current branch). hasPriorCommentsfrom Stage 1 is true (the PR has at least one review submission or issue comment).
Skip it for standalone branch reviews with no associated PR, and skip it for PRs with no prior feedback yet -- there is nothing for the persona to verify, and a spawned subagent that returns empty findings still costs the full subagent startup overhead (persona spec, diff, schema, plus its own gh calls).
Stack-specific personas are additive when runtime behavior warrants them. A Hotwire UI change may warrant julik-frontend-races; a TypeScript API diff may warrant api-contract and reliability.
data-migration spawn gate. Select ce-data-migration-reviewer only when the diff includes at least one migration or schema artifact: db/migrate/*, db/schema.rb, db/structure.sql, Alembic/Flyway/Liquibase migration paths, or explicit backfill/data-transform scripts (rake tasks, one-off data migration classes). Do not spawn for model-only changes, query-only refactors, serializers/controllers that reference columns without a migration or schema dump in the diff, or migration tests alone.
For ce-deployment-verification-agent, use the same migration-artifact gate when the change is risky (destructive DDL, backfills, NOT NULL without default, column renames/drops).
Announce the team before spawning:
Review team:
- correctness (always)
- testing (always)
- maintainability (always)
- project-standards (always)
- ce-agent-native-reviewer (always)
- ce-learnings-researcher (always)
- security -- new endpoint in routes.rb accepts user-provided redirect URL
- julik-frontend-races -- Stimulus controller with async DOM updates
- data-migration -- adds migration 20260303_add_index_to_orders
- ce-deployment-verification-agent -- destructive migration with backfill
This is progress reporting, not a blocking confirmation.
Stage 3b: Discover project standards paths
Before spawning sub-agents, find the file paths (not contents) of all relevant standards files for the project-standards persona. Use the native file-search/glob tool to locate:
- Use the native file-search tool (e.g., Glob in Claude Code) to find all
/CLAUDE.mdand/AGENTS.mdin the repo. - Filter to those whose directory is an ancestor of at least one changed file. A standards file governs all files below it (e.g.,
plugins/compound-engineering/AGENTS.mdapplies to everything underplugins/compound-engineering/).
Pass the resulting path list to the project-standards persona inside a <standards-paths> block in its review context (see Stage 4). The persona reads the files itself, targeting only the sections relevant to the changed file types. This keeps the orchestrator's work cheap (path discovery only) and avoids bloating the subagent prompt with content the reviewer may not fully need.
Stage 4: Spawn sub-agents
Model tiering
Three reviewers inherit the session model with no override: ce-correctness-reviewer, ce-security-reviewer, and ce-adversarial-reviewer. These perform the highest-stakes analysis — logic bugs, security vulnerabilities, adversarial failure scenarios — and should run at whatever capability level the user has configured. If the user is on Opus, these get Opus.
All other persona sub-agents and CE agents use the platform's mid-tier model to reduce cost and latency. See the Spawning subsection below for the exact dispatch-time override.
The orchestrator (this skill) also inherits the session model; it handles intent discovery, reviewer selection, finding merge/dedup, and synthesis.
Run ID
Generate a unique run identifier before dispatching any agents. This ID scopes all agent artifact files and the post-review run artifact to the same directory.
RUN_ID=$(date +%Y%m%d-%H%M%S)-$(head -c4 /dev/urandom | od -An -tx1 | tr -d ' ')
mkdir -p "/tmp/compound-engineering/ce-code-review/$RUN_ID"
Pass {run_id} to every persona sub-agent so they can write their full analysis to /tmp/compound-engineering/ce-code-review/{run_id}/{reviewer_name}.json.
Large shared context — pass paths, not contents. The diff and file list go to every reviewer and validator. When inlining them into each subagent prompt would be wasteful (many files / a big diff), write them once into the run dir (e.g. full.diff, files.txt) and pass those paths in the diff / changed-files slots instead of inline content — the subagent and validator templates instruct the child to Read a staged path. Inline a small diff directly.
Spawning
Omit the mode parameter when dispatching sub-agents so the user's configured permission settings apply. Do not pass mode: "auto".
Model override at dispatch time. Pass the platform's mid-tier model on every dispatch except ce-correctness-reviewer, ce-security-reviewer, and ce-adversarial-reviewer, which inherit the session model (per the Model tiering subsection above). In Claude Code, add model: "sonnet" to the Agent tool call. In Codex, pass the equivalent mid-tier on spawn_agent (e.g., gpt-5.4-mini as of April 2026). In Pi, pass the equivalent on subagent via the pi-subagents extension. On platforms where the dispatch primitive has no model-override parameter or the available model names are unknown, omit the override — a working review on the parent model beats a broken dispatch on an unrecognized name. Check this on every Agent / spawn_agent / subagent call in the parallel dispatch; omitting it on Opus sessions silently 3-4x's the cost of a review.
Bounded parallel dispatch. Respect the current harness's active-subagent limit. Queue selected reviewers, dispatch only as many as the harness accepts, and fill freed slots as reviewers complete. Treat active-agent/thread/concurrency-limit spawn errors as backpressure, not reviewer failure: leave the reviewer queued and retry after a slot frees. Record a reviewer as failed only after a successful dispatch times out/fails, or when dispatch fails for a non-capacity reason.
Spawn each selected persona reviewer using the subagent template included below. Each persona sub-agent receives:
- Their persona file content (identity, failure modes, calibration, suppress conditions)
- Shared diff-scope rules from the diff-scope reference included below
- The JSON output contract from the findings schema included below
- PR metadata: title, body, and URL when reviewing a PR (empty string otherwise). Passed in a
<pr-context>block so reviewers can verify code against stated intent - Review context: intent summary, file list, diff, scope mode (
local-aligned|pr-remote|branch-remote), and remote head ref (PR_HEAD_REFor<branch-head-ref>) when set - Run ID and reviewer name for the artifact file path
- For
project-standardsonly: the standards file path list from Stage 3b, wrapped in a<standards-paths>block appended to the review context - For
data-migrationonly: the resolved review base ref from Stage 1 (BASE:marker), wrapped in<review-base>inside the review context so schema drift checks never assumemain
Persona sub-agents are read-only with respect to the project: they review and return structured JSON. They do not edit project files or propose refactors. The one permitted write is saving their full analysis to the run-artifact path specified in the output contract (under /tmp/compound-engineering/ce-code-review/<run-id>/).
Read-only here means non-mutating, not "no shell access." Reviewer sub-agents may use non-mutating inspection commands when needed to gather evidence or verify scope, including read-oriented git / gh usage such as git diff, git show, git blame, git log, and gh pr view. In pr-remote or branch-remote scope (see Stage 1), inspect changed files via git show <remote-head-ref>:<path> or diff hunks — do not Read/Grep workspace paths for files in scope. They must not edit project files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
Each persona sub-agent writes full JSON (all schema fields) to /tmp/compound-engineering/ce-code-review/{run_id}/{reviewer_name}.json and returns compact JSON with merge-tier fields only:
{
"reviewer": "security",
"findings": [
{
"title": "User-supplied ID in account lookup without ownership check",
"severity": "P0",
"file": "orders_controller.rb",
"line": 42,
"confidence": 100,
"autofix_class": "gated_auto",
"owner": "downstream-resolver",
"requires_verification": true,
"pre_existing": false,
"suggested_fix": "Add current_user.owns?(account) guard before lookup"
}
],
"residual_risks": [...],
"testing_gaps": [...]
}
The artifact file must carry the detail-tier fields (why_it_matters, evidence); the compact return omits them, but writing the compact shape to the artifact (a common reviewer slip) silently strips the detail Coverage and the keyed detail lines depend on. However review context is delivered — inlined, or staged to disk for a large diff — each reviewer still receives the full subagent-template output contract; staging context never licenses a thinner one. suggested_fix is optional in both tiers -- included in compact returns when present so callers can apply fixes after review. If the file write fails, the compact return still provides everything the merge needs.
CE always-on agents (ce-agent-native-reviewer, ce-learnings-researcher) are dispatched as standard Agent calls through the same bounded parallel scheduler as the persona agents. Give them the same review context bundle the personas receive: entry mode, any PR metadata gathered in Stage 1, intent summary, review base branch name when known, BASE: marker, file list, diff, and UNTRACKED: scope notes. Do not invoke them with a generic "review this" prompt. Their output is unstructured and synthesized separately in Stage 6.
CE conditional agents (ce-deployment-verification-agent only) are dispatched as standard Agent calls through the same bounded parallel scheduler when the migration-artifact gate applies. Pass the same review context bundle plus the applicability reason (for example, which migration files triggered the agent). Their output is unstructured and must be preserved for Stage 6 synthesis just like the CE always-on agents. Schema drift is handled by the data-migration persona as structured findings — not here.
Stage 5: Merge findings
Convert multiple reviewer compact JSON returns into one deduplicated, confidence-gated finding set. The compact returns contain merge-tier fields (title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing) plus the optional suggested_fix. Detail-tier fields (why_it_matters, evidence) are on disk in the per-agent artifact files and are not loaded at this stage.
confidence is one of 5 discrete anchors (0, 25, 50, 75, 100) with behavioral definitions in the findings schema. Synthesis treats anchors as integers; do not coerce to floats.
- Validate. Check each compact return for required top-level and per-finding fields, plus value constraints. Drop malformed returns or findings. Record the drop count.
- Top-level required: reviewer (string), findings (array), residual_risks (array), testing_gaps (array). Drop the entire return if any are missing or wrong type.
- Per-finding required: title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing
- Value constraints:
- severity: P0 | P1 | P2 | P3
- autofix_class: gated_auto | manual | advisory
- owner: downstream-resolver | human | release
- confidence: integer in {0, 25, 50, 75, 100}
- line: positive integer
- pre_existing, requires_verification: boolean
- Do not validate against the full schema here -- the full schema (including why_it_matters and evidence) applies to the artifact files on disk, not the compact returns.
- Deduplicate. Compute fingerprint:
normalize(file) + line_bucket(line, +/-3) + normalize(title). When fingerprints match, merge: keep highest severity, keep highest anchor, note which reviewers flagged it. Dedup runs over the full validated set (including anchor 50) so cross-reviewer promotion in step 3 can lift matching anchor-50 findings into the actionable tier. - Cross-reviewer agreement. When 2+ independent reviewers flag the same issue (same fingerprint), promote the merged finding by one anchor step:
50 -> 75,75 -> 100,100 -> 100. Note the agreement in the Reviewer column of the output (e.g., "security, correctness"). - Separate pre-existing. Pull out findings with
pre_existing: trueinto a separate list. - Resolve disagreements. When reviewers flag the same code region but disagree on severity, autofix_class, or owner, annotate the Reviewer column with the disagreement (e.g., "security (P0), correctness (P1) -- kept P0").
- Normalize routing. For each merged finding, set the final
autofix_class,owner, andrequires_verification. If reviewers disagree, keep the more conservative route. Remap any legacysafe_autoorreview-fixertogated_auto/downstream-resolver.
6b. Mode-aware demotion of weak general-quality findings. Some persona output is real signal but does not warrant primary-findings attention. Reroute it to the existing soft buckets so the primary findings table stays focused on actionable issues.
A finding qualifies for demotion when all of these hold:
- Severity is P2 or P3 (P0 and P1 always stay in primary findings)
autofix_classisadvisory(concrete-fix findings stay in primary)- All contributing reviewers are
testingormaintainability— if any other persona also flagged this finding, cross-reviewer corroboration is present and the finding stays in primary findings regardless of its severity or advisory status (expand the weak-signal list later only with evidence)
When a finding qualifies:
- Move demoted findings out of the primary set. If the contributing reviewer is
testing, append<file:line> -- <title>totesting_gaps. Ifmaintainability, append toresidual_risks. Use title-only lines (compact return omitswhy_it_matters). Record the demotion count for Coverage.
- Confidence gate. After dedup, promotion, and demotion have shaped the primary set, suppress remaining findings below anchor 75. Exception: P0 findings at anchor 50+ survive the gate -- critical-but-uncertain issues must not be silently dropped. Record the suppressed count by anchor (so Coverage can report "N findings suppressed at anchor 50, M at anchor 25"). The gate runs late deliberately: anchor-50 findings need a chance to be promoted by step 3 (cross-reviewer corroboration) or rerouted by step 6b (mode-aware demotion to soft buckets) before any drop decision.
- Partition the work. Build two sets:
- actionable queue:
gated_autoormanualfindings whose owner isdownstream-resolver(hand off to caller) - report-only queue:
advisoryfindings plus anything owned byhumanorrelease
- Sort and number. Order by severity (P0 first) -> anchor (descending) -> file path -> line number, then assign monotonically increasing
#values across the full primary finding set in that sorted order. Do not restart numbering inside each severity table, triage group, or autofix/routing bucket. If later sections repeat a finding (for example Actionable Findings), reuse the same stable#so users and downstream workflows can reference findings by#across the report and caller handoff.
9b. Build thematic triage groups. After stable # values exist, group related findings so the reader can triage themes instead of items. This is distinct from deduplication: dedupe answers "are these the same finding?", grouping answers "are these distinct findings that should be understood or resolved together?". Groups never merge findings into a synthetic finding and never change a finding's severity, confidence, route, owner, or stable #. Groups span the full primary finding set — both actionable and report-only findings — so they organize the whole report, not just the apply queue.
grouping:off: skip this step.grouping:auto(default): build groups when findings span distinct concerns — the trigger is distinct concerns, not item count (mirroring how plan Requirements group by capability). Skip only when all findings are genuinely about the same thing; prefer no groups over decorative single-item groups.grouping:always: always build groups; use single-finding groups only when no meaningful multi-finding grouping exists.- Grouping signals: shared root cause, affected subsystem, user-facing failure mode, overlapping fix path, dependency ordering, or repeated symptoms of one design choice.
- Group shape: short title, the included stable finding
#s, one-line context, preferred resolution, and why — when one fix path resolves several findings, name it and say which finding to handle first. - Ordering: order groups by the highest-severity finding they contain, then by lowest stable
#. A finding appears in at most one group; leave genuinely unrelated findings ungrouped.
- Collect coverage data. Union residual_risks and testing_gaps across reviewers.
- Preserve CE agent artifacts. Keep the learnings, agent-native, and deployment-verification outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema. Schema drift from
data-migrationis already in the merged finding set.
Stage 5b: Validation pass (optional quality gate)
Independent verification gate. Spawn one validator sub-agent per surviving finding using references/validator-template.md. Findings the validator rejects are dropped; confirmed findings flow through unchanged.
When this stage runs: After Stage 5 whenever at least one finding survives — skip only when zero survive. When more than 15 survive, do not skip the stage; validate per the budget cap in step 2. The default method is the per-finding validator wave (steps below); a surviving P2/P3 finding at anchor 100 may instead be validated by direct first-party verification (see below). Same rule for default and mode:agent.
Steps:
- Select findings to validate. All survivors of Stage 5.
- Apply dispatch budget cap. If the selected set exceeds 15 findings, validate the highest-severity 15 (P0 first, then P1, then P2, then P3, breaking ties by anchor descending), dropping only from the P2/P3 tail. Never drop a P0 or P1 from validation — if P0/P1 findings alone exceed 15, raise the cap to include all of them. Record the over-budget count (the dropped P2/P3 tail) for the Coverage section.
- Spawn validators with bounded parallelism. One sub-agent per finding, dispatched independently using the validator template and the same bounded scheduler from Stage 4. Each validator receives:
- The finding's title, severity, file, line, suggested_fix, original reviewer name, and confidence anchor
why_it_matterswhen available — loaded from the per-agent artifact file at/tmp/compound-engineering/ce-code-review/{run_id}/{reviewer_name}.json; omit when the file is absent or the artifact write failed. The validator proceeds without it, using the diff and cited code directly.- The full diff
- The scope mode and remote head ref, mirroring the Stage 4 reviewer bundle: inject
<pr-scope-mode>local-aligned | pr-remote | branch-remote</pr-scope-mode>and, when set,<pr-head-ref>...</pr-head-ref>or<branch-head-ref>...</branch-head-ref>. The validator template defaults to local-aligned workspace inspection when these are absent, so omitting them inpr-remote/branch-remotemakes validators verify findings against the stale working tree — dropping valid findings or confirming false ones on the wrong tree. - Inspection access scoped by mode: in
local-aligned, Read/Grep/git blame the cited code, callers, guards, framework defaults, and history; inpr-remote/branch-remote, inspect viagit show <remote-head-ref>:<path>or the provided diff hunks only — do not Read/Grep workspace paths for files in scope.
- Collect verdicts. Each validator returns
{ "validated": true | false, "reason": "<one sentence>" }.
validated: true-> finding survives unchanged into Stage 6validated: false-> finding is dropped; record the validator's reason in Coverage- Validator infrastructure failure (timeout, dispatch error, malformed JSON — not a
validated:falseverdict): for P2/P3, drop the finding with reason "validator failed" (conservative bias). For P0/P1, do not drop on infra failure — keep the finding and mark its validation degraded (note in Coverage). A transient validator failure must never silently remove a critical/high finding; a genuinevalidated:falserejection above still drops at any severity.
- Use mid-tier model for validators. Same model class (sonnet) the persona reviewers use. Validators are read-only — same constraints as persona reviewers. They may use non-mutating inspection commands (Read, Grep, Glob, git blame, gh).
- Record metrics for Coverage. Total dispatched, validated true count, validated false count (with reasons), infra failures (and any P0/P1 kept-on-failure as degraded), and over-budget drops.
- Prune triage groups after drops. When validation dropped any finding, rebuild or prune
triage_groupsfrom the validated set: a group must never reference a#that was rejected or dropped. Remove groups left with fewer than two findings undergrouping:auto; undergrouping:always, keep them as single-finding groups only when still meaningful.
Orchestrator direct verification. When a finding hinges on a fact the orchestrator can check cheaply and authoritatively — a pinned dependency's source, a wiring/config fact in this repo, a build tag — verify it directly with single-purpose native tools (Read/Grep/Glob, one git command at a time), never chained or error-suppressed shell. Fold confirmed facts into synthesis. Whether it can replace the independent validator turns on a single distinction: the orchestrator is not an independent second opinion (it synthesized these findings), so direct verification catches a wrong fact but not the orchestrator's own bias. Independence adds nothing to a mechanically-checkable fact and everything to a judgment call:
- P0/P1, any anchor: the per-finding validator wave is required; direct verification only complements it, never replaces it.
- P2/P3 at anchor 100 (verifiable from code alone — compile/type error, definitive logic bug, quotable standards violation, no interpretation): direct verification may stand in for the wave; note the method in Coverage.
- P2/P3 at anchor 75 (judgment call — "will affect users," not airtight): the independent wave is required — this is exactly where a fresh second opinion filters false positives, and the orchestrator cannot supply that for its own findings.
Why per-finding bounded dispatch (not batched): Independence is the point. A single batched validator looking at all findings together pattern-matches across them and recreates the persona-bias problem. Per-finding dispatch preserves fresh context while the scheduler respects harness limits.
Stage 5c: Act on findings (default mode only)
Skip entirely in mode:agent — that mode is a machine handoff and the caller owns apply. In default (interactive) mode the review is the top-level agent, so it applies the fixes it is confident in before presenting the report.
Act policy (bias to act). Default to applying every finding that is a clear improvement and a reversible edit, regardless of severity. The work is a tracked, visible diff that can be reverted — so leaving a clean fix unapplied "to be safe" is the failure mode, not the safe choice. Decide by judgment, not a safety checklist:
- Apply clear improvements — the common case (test hardening, dead-code removal, a localized fix with a concrete
suggested_fix). - Push back — do not apply — when the reviewer is wrong; keep the finding and state the disagreement with reasoning.
- Skip with judgment taste calls and conflicting suggestions, but surface what was skipped and why. Never silently drop.
Severity, confidence, and cross-reviewer agreement tell you what to do first and what to flag loudly — they do not gate the decision. There is no deny-list: downside is controlled after the fact (revert + visible diff + the commit checkpoint), not by a precondition.
Scope invariant. Apply only when the working tree is what was reviewed — local-aligned or standalone. In pr-remote / branch-remote the working tree is not the reviewed head; do not apply — report instead.
Verify, then keep. After applying, run the affected tests and lint (targeted by default; broaden when fixes span files). If they fail, revert that fix and report it as a finding instead — an unverified fix is not finished. Never leave the tree red.
Commit when the pre-review tree was clean. Before applying, note whether the working tree already had uncommitted changes (git status --porcelain). The permanence gate is the push, not the commit — a local commit is private and reversible (git reset --soft HEAD~1).
- Clean before the review: after applying and verifying, commit the fixes as one isolated, review-labeled fix commit —
fix(review): <summary>, or the repo's nearest convention ifreviewisn't an allowed scope. Labeled and reversible, returning the tree to a known state. - Dirty before the review: apply but do not commit — the fixes interleave with the user's in-flight work and ride along with the commit they were already going to make. The Applied section lists what changed.
- Never push, open a PR, or file tickets — that's the outward-facing step the user owns.
Surface green-but-unverifiable edits. When an applied fix touches auth/authz, a public or cross-service contract/schema, or concurrency/ordering, a passing test does not prove safety — flag it prominently in the Applied section so the diff reviewer's attention goes there.
Re-partition triage groups after apply. Triage groups describe the remaining work. After Stage 5c, prune applied findings out of triage_groups before Stage 6 rendering — a group must never tell the user to handle a finding that was already applied. When an applied fix resolved part of a theme, note that in the group's context line instead of keeping the applied # in the group. Re-apply the same minimum-size rule as Stage 5b step 7 (drop sub-two-finding groups under grouping:auto).
Stage 6: Synthesize and present
Assemble the final report. Default: pipe-delimited markdown tables for findings (mandatory — see review output template). mode:agent: skip markdown and emit JSON (see ### JSON output format). Other sections (Actionable Findings, Learnings, Coverage, etc.) use bullets and --- before the verdict in markdown mode only.
Before writing the report, load references/review-output-template.md and mirror it — that file is the canonical skeleton (full per-section structure). The block below is the always-loaded fallback so the shape survives a long session even if the template was not reloaded.
Findings table shape (default mode — load-bearing, do not improvise). Every finding is a row in a pipe-delimited table grouped by severity, with a terse Issue cell; depth goes in a keyed detail line under the table. Copy this shape; do not invent a layout:
| # | File | Issue | Reviewer | Confidence |
|---|---|---|---|---|
| 1 | path/to/file.go:42 | One terse line — the scannable index | correctness | 100 |
- #1 — full explanation here (why it matters + concrete fix direction), as a keyed detail line under the table.
Per-severity tables are 5 columns — Route is not shown here (it appears only in the Actionable Findings table and the JSON). Keep the Issue cell to one short clause (roughly 12 words or fewer, no second sentence, no because/so/which explanation) — it is the scannable index, not the explanation. The moment a cell wants a comma-plus-clause or a reason, move that depth into the keyed detail line (- #N — …) instead of packing it in — usually for P0/P1; P2/P3 are typically terse-only.
*Never produce these shapes (instant fail — applies to every tabular section, the Applied table included, not just the severity findings; if you catch one mid-draft, re-render before delivering):*
- Any row — a finding or an Applied entry — rendered as
Field:-prefixed blocks (#:/Sev:/File:/Issue:/Fix:/Route:lines) — depth goes in the keyed detail line, never a field block - Per-row separators made of horizontal rules or box-drawing characters (
────,———) - A table replaced by a plain bulleted/numbered list (the keyed
- #N —detail line under a table is a supplement, not a replacement — that is expected) - Unicode separators or arrows in cells (middot
·); use ASCII-> - Inconsistent treatment across severities or sections (e.g. P1 as blocks while P2/P3 are tables, or the Applied table as field-blocks while findings are tables) — every table uses the same pipe-delimited shape
- Header. Scope, intent, mode, reviewer team with per-conditional justifications.
- Applied (default mode only). When Stage 5c applied fixes, list them first — before the findings tables — in an Applied section (see review output template) as a pipe table
| # | File | Fix | Reviewer |— neverField:-blocks or────separators, same rules as the findings tables — then a one-line validation outcome (e.g. "pin tests 4 -> 6; suite 94 pass, lint clean") and commit status (committed on a clean tree asfix(review): …or the repo's nearest convention, or left uncommitted for the user on a dirty one). Flag green-but-unverifiable edits (auth/contract/concurrency) prominently. Omit this section inmode:agentand when nothing was applied. Applied findings appear here, not in the severity tables.
2b. Triage Groups. When finalized triage_groups exist (post-validation, post-apply — Stage 5b step 7 / Stage 5c), render a ### Triage Groups section before the severity tables as a pipe table | Group | Findings | Context | Preferred Resolution | Why |. The Findings cell references stable #s (e.g. #1, #3); verify every referenced # appears in the severity tables below. Groups supplement the severity tables, never replace them. Omit the section when grouping:off is active or no groups survived. In mode:agent this section is carried by the triage_groups JSON field instead.
- Findings. Pipe-delimited tables grouped by severity (
### P0 -- Critical,### P1 -- High,### P2 -- Moderate,### P3 -- Low), using the shape above — the same shape for every severity. Omit empty severity levels. Finding numbers come from the stable assignment in Stage 5 -- never re-derive them per severity table or triage group. - Requirements Completeness. Include only when a plan was found in Stage 2b. For each requirement (R1, R2, etc.) and implementation unit in the plan, report whether corresponding work appears in the diff. Use a simple checklist: met / not addressed / partially addressed. Routing depends on
plan_source:
explicit(caller-provided or PR body): Flag unaddressed requirements or implementation units as P1 findings withautofix_class: manual,owner: downstream-resolver. These enter the actionable queue.inferred(auto-discovered): Flag unaddressed requirements or implementation units as P3 findings withautofix_class: advisory,owner: human. These stay in the report only — no autonomous follow-up. An inferred plan match is a hint, not a contract.
Omit this section entirely when no plan was found — do not mention the absence of a plan.
- Actionable Findings. Include when the actionable queue is non-empty — findings the caller should address (
gated_auto/manualwithdownstream-resolver), plus anything Stage 5c chose not to apply. In default mode, findings already applied appear in the Applied section, not here. - Pre-existing. Separate section, does not count toward verdict.
- Learnings & Past Solutions. Surface ce-learnings-researcher results: if past solutions are relevant, flag them as "Known Pattern" with links to docs/solutions/ files.
- Agent-Native Gaps. Surface ce-agent-native-reviewer results. Omit section if no gaps found.
- Deployment Notes. If ce-deployment-verification-agent ran, surface the key Go/No-Go items: blocking pre-deploy checks, the most important verification queries, rollback caveats, and monitoring focus areas. Keep the checklist actionable rather than dropping it into Coverage. Schema drift appears in the findings tables as
data-migrationP1 rows — do not add a separate Schema Drift section. - Coverage. Applied count (when Stage 5c ran), suppressed count by anchor (e.g., "N findings suppressed at anchor 50, M at anchor 25"), mode-aware demotion count, validator drop count and reasons (when Stage 5b ran), any P0/P1 with degraded validation (kept on validator infra failure), validator over-budget drops (when the 15-cap fired), residual risks, testing gaps, failed/timed-out reviewers, and inferred-intent uncertainty when applicable.
- Verdict. Ready to merge / Ready with fixes / Not ready. Fix order if applicable. When an
explicitplan has unaddressed requirements or implementation units, the verdict must reflect it — a PR that's code-clean but missing planned requirements is "Not ready" unless the omission is intentional. When aninferredplan has unaddressed requirements or implementation units, note it in the verdict reasoning but do not block on it alone.
Do not include time estimates.
Format verification (default only — last gate before delivering). Before delivering, scan every table — the Applied table, the Triage Groups table, and each severity findings table — for the forbidden shapes: Field:-prefixed blocks (#: / File: / Fix: / Issue:), box-drawing or horizontal-rule separators (────), middot ·, or a list replacing a table. The Applied table is the most common offender — check it explicitly. If any table hit one of these, STOP and re-render it as the same pipe-delimited shape before delivering. (The keyed - #N — detail line under a table is expected — not a failure.) Skip only when mode:agent is active.
JSON output format (mode:agent only)
Emit one raw JSON object as the primary response — a single bare JSON value, no markdown code fence. A leading ``` `json `` fence makes the response start with backticks and breaks naive JSON.parse consumers, so never wrap it. Also write review.json under /tmp/compound-engineering/ce-code-review/<run-id>/` with the same payload.
mode:agent does not apply fixes — the caller does — so there is no applied_fixes field; the handoff is actionable_findings. Applied work surfaces only in the default-mode markdown Applied section (Stage 5c/6).
Minimum shape:
{
"status": "complete",
"verdict": "Ready to merge | Ready with fixes | Not ready",
"scope": {
"base": "<merge-base sha, pr:NNN marker, or base: ref>",
"branch": "<current branch name>",
"head_sha": "<git rev-parse HEAD>",
"pr_url": "<url or null>",
"files_changed": 0
},
"intent": "<2-3 line summary>",
"intent_confidence": "explicit | inferred | uncertain",
"reviewers": ["correctness", "security"],
"findings": [],
"actionable_findings": [],
"triage_groups": [],
"pre_existing_findings": [],
"requirements_completeness": null,
"learnings": [],
"agent_native_gaps": [],
"deployment_notes": [],
"residual_risks": [],
"testing_gaps": [],
"coverage": {},
"artifact_path": "/tmp/compound-engineering/ce-code-review/<run-id>/",
"run_id": "<run-id>"
}
Each object in findings uses the merged finding fields: #, title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix, why_it_matters, evidence, reviewers.
actionable_findings lists the gated_auto / manual + downstream-resolver subset with the same fields plus stable #.
Each object in triage_groups carries { "title", "findings": [<stable #s>], "context", "preferred_resolution", "why" } — the finalized groups from Stage 5 step 9b after Stage 5b pruning. Every referenced # must exist in findings (the full set) — not necessarily in actionable_findings. Groups are a triage lens over all findings, not an apply queue: a group (and its preferred_resolution ordering) can reference advisory or human/release-owned findings that the caller must not apply. So a caller batching related fixes by theme must first intersect each group's findings with actionable_findings and act only on that subset — the apply handoff stays actionable_findings, never triage_groups. Empty array when grouping:off is active or no groups were built.
On failure before review completes, set "status": "failed" and "reason": "<one sentence>". When all reviewers fail, use "status": "degraded" with a reason. When a PR skip rule fires (closed/merged/trivial), use "status": "skipped" with the skip reason. Do not emit markdown tables when mode:agent is active.
Quality Gates
Before delivering the review, verify:
- Every finding is actionable. Re-read each finding. If it says "consider", "might want to", or "could be improved" without a concrete fix, rewrite it with a specific action. Vague findings waste engineering time.
- No false positives from skimming. For each finding, verify the surrounding code was actually read. Check that the "bug" isn't handled elsewhere in the same function, that the "unused import" isn't used in a type annotation, that the "missing null check" isn't guarded by the caller.
- Severity is calibrated. A style nit is never P0. A SQL injection is never P3. Re-check every severity assignment.
- Line numbers are accurate. Verify each cited line number against the file content. A finding pointing to the wrong line is worse than no finding.
- Protected artifacts are respected. Discard any findings that recommend deleting or gitignoring files in
docs/brainstorms/,docs/plans/, ordocs/solutions/. - Findings don't duplicate linter output. Don't flag things the project's linter/formatter would catch (missing semicolons, wrong indentation). Focus on semantic issues.
Language-Aware Conditionals
Stack-specific reviewers fire only when the diff touches runtime behavior they specialize in (async UI races, iOS/Swift lifecycle) — never mechanically from file extensions alone; the trigger is meaningful changed behavior in that stack's runtime domain. Structural quality (complexity deletion, 1k-line regressions, type-boundary leaks) lives in the always-on ce-maintainability-reviewer; do not spawn extra reviewers for language conventions, philosophy, or "strict bar" passes.
After Review
After Stage 6, stop. Never push, open PRs, or file tickets from this skill. In default (interactive) mode, Stage 5c has already applied and (on a clean pre-review tree) committed the safe fixes; in mode:agent the review mutates nothing — the caller (for example ce-work) and the user apply fixes, file tickets, or accept residual risk using the report and artifact.
Emit actionable findings summary (default mode only)
After Stage 6 in default mode, emit a compact Actionable Findings summary for callers:
- List each actionable finding (
gated_autoormanualwithdownstream-resolver) with stable#, severity, file:line, title,autofix_class, whethersuggested_fixis present, andconfidence. - Include the run-artifact path when one was written:
/tmp/compound-engineering/ce-code-review/<run-id>/ - When the actionable queue is empty, state
Actionable findings: none.explicitly.
In mode:agent do not emit this markdown summary — the actionable findings are carried solely by the actionable_findings field of the JSON object. Emit nothing after the JSON object, so the response stays a single parseable JSON value.
Do not run post-review triage (no per-finding walk-through, bulk ticket filing, or routing questions). The report and summary are the complete handoff.
Mode-specific completion
| Mode | After Stage 6 + actionable summary |
|---|---|
| Default | Markdown tables + Actionable Findings summary. |
mode:agent | JSON object + review.json in run artifact dir. |
Do not offer push/PR/create-branch next steps from this skill.
Run artifacts
Always write run artifacts under /tmp/compound-engineering/ce-code-review/<run-id>/:
- synthesized findings
- actionable findings list
- advisory outputs
- per-agent
{reviewer_name}.jsonfrom Stage 4 report.md— the rendered markdown report exactly as presented to the user (default mode only), so format and numbering stay auditable after the run
metadata.json minimum fields:
{
"run_id": "<run-id>",
"branch": "<git branch --show-current at dispatch time>",
"head_sha": "<git rev-parse HEAD at dispatch time>",
"verdict": "<Ready to merge | Ready with fixes | Not ready>",
"completed_at": "<ISO 8601 UTC timestamp>"
}
Capture branch and head_sha at dispatch time (no in-skill fixes will land afterward).
Fallback
If the platform doesn't support parallel sub-agents, run reviewers sequentially. If the platform supports sub-agents but caps active concurrency, use the bounded queueing rules in Stage 4 rather than treating cap-related spawn failures as reviewer failures. Everything else (stages, output format, merge pipeline) stays the same.
---
Included References
The files below are inlined at load time. The review output template is not inlined — Stage 6 loads it on demand (references/review-output-template.md).
Persona Catalog
@./references/persona-catalog.md
Subagent Template
@./references/subagent-template.md
Diff Scope Rules
@./references/diff-scope.md
Action class rubric
@./references/action-class-rubric.md
Findings Schema
@./references/findings-schema.json

