harness

harness-engineering

productivityClaude Codeby suhanlee

Summary

Harness Engineering - Planner/Generator/Evaluator 3-agent pipeline that iterates until mission complete

Install to Claude Code

/plugin install harness@harness-engineering

Run in Claude Code. Add the marketplace first with /plugin marketplace add suhanlee/harness if you haven't already.

README.md

Harness Engineering

3-Agent Pipeline Pattern for Claude Code

An agent orchestration pattern where three specialized agents — Planner, Generator, and Evaluator — collaborate in a pipeline, iterating in rounds until the mission is accomplished.

Planner(opus) → Generator(sonnet) → Evaluator(opus)
    ↑                                      │
    └──────────── feedback ────────────────┘
                Round 1 → 2 → ... → N (until mission complete)

Why Harness Engineering?

When a single agent plans, executes, and verifies its own work, it introduces bias and blind spots.

| Problem | Cause | Harness Solution | |---------|-------|-----------------| | Lenient self-evaluation | Executor = Verifier | Role separation (Generator ≠ Evaluator) | | One-way progress without feedback | No feedback loop | Structured Evaluator → Planner feedback | | Patching without plan revision | Implicit retry | Explicit re-planning by Planner based on feedback | | Ambiguous completion criteria | No definition | Measurable acceptance criteria + scoring system |

Agent Roles

| Agent | Role | Model | Core Principle | |-------|------|-------|----------------| | Planner | Analyze mission, create execution plan, revise based on feedback | opus | Does NOT write code | | Generator | Execute the plan (code, config, docs) | sonnet | Does NOT evaluate, follows plan only | | Evaluator | Verify output, issue PASS/FAIL verdict, provide feedback | opus | Does NOT modify code |

Installation

Option 1: Claude Code Plugin Marketplace (Recommended)

Install directly from GitHub inside Claude Code:

# 1. Add marketplace
/plugin marketplace add suhanlee/harness

# 2. Install plugin
/plugin install harness@harness-engineering

# 3. Reload plugins
/reload-plugins

The /harness command is ready to use after installation.

Option 2: Install Script

curl -fsSL https://raw.githubusercontent.com/suhanlee/harness/main/install.sh | bash

Option 3: Manual File Copy

# From your project root
mkdir -p .claude/commands .claude/skills
cp -r harness/.claude/commands/harness.md .claude/commands/
cp -r harness/.claude/skills/harness-* .claude/skills/

Option 4: Git Submodule

git submodule add https://github.com/suhanlee/harness.git .harness
# Then copy or symlink files into .claude/

Usage

Autopilot Mode (Default)

Just provide the mission — Harness auto-analyzes the task and determines the optimal configuration:

/harness Implement a data collection pipeline for the budget-eats service

Autopilot will: 1. Analyze the mission scope (files affected, complexity, work areas) 2. Classify the size (S / M / L / XL) 3. Auto-configure rounds, pass threshold, and agent counts 4. Execute immediately without waiting for confirmation

## Autopilot Analysis

Mission: Implement data collection pipeline
Size: M (Medium) — ~8 files, 2 work areas
Estimated token budget: ~150K

Auto-configured:
- Max rounds: 3
- Pass threshold: 8/10
- Agents: 1 Planner, 1 Generator, 1 Evaluator

Starting Round 1...

Autopilot Size Classification

| Size | Files | Complexity | max_rounds | threshold | Planners | Generators | Evaluators | |------|-------|------------|-----------|-----------|----------|------------|------------| | S | 1-3 | Single concern | 2 | 7 | 1 | 1 | 1 | | M | 4-10 | Multiple concerns | 3 | 8 | 1 | 1 | 1 | | L | 11-25 | Cross-cutting | 5 | 8 | 1 | 2 | 1 | | XL | 25+ | System-wide | 7 | 8 | 1 | 3 | 2 |

Manual Mode

Use --manual or specify --rounds/--threshold to override autopilot:

# Explicit manual mode
/harness --manual --rounds 3 Implement user authentication

# Specifying --rounds implies manual mode
/harness --rounds 7 --threshold 9 Build the payment API

# Override agent counts
/harness --generators 3 Refactor the auth module

# Custom model configuration
/harness --manual --rounds 3 --planner-model sonnet Fix the deployment pipeline

All Options

| Option | Description | Default | |--------|-------------|---------| | --manual | Use manual mode (skip auto-analysis) | off (autopilot) | | --rounds N | Maximum iteration count (implies manual) | 5 | | --threshold N | Minimum score for PASS, 1-10 (implies manual) | 8 | | --planner-model MODEL | Model for Planner agent | opus | | --generator-model MODEL | Model for Generator agent | sonnet | | --evaluator-model MODEL | Model for Evaluator agent | opus | | --planners N | Number of parallel Planner agents | 1 | | --generators N | Number of parallel Generator agents | 1 | | --evaluators N | Number of parallel Evaluator agents | 1 |

Parallel Agents

When multiple agents of the same role are configured:

  • Multiple Generators — Planner's task list is partitioned by work area; Generators run in parallel
  • Multiple Evaluators — Each reviews a different aspect (correctness vs. quality); scores are averaged
  • Multiple Planners — Each produces an independent plan; plans are merged into consensus

Add to CLAUDE.md (Optional)

Add the following to your project's CLAUDE.md so Claude Code automatically recognizes the pattern:

## Agent Execution Pattern: Harness Engineering

For complex tasks, apply the **Harness Engineering** pattern.
Three specialized agents collaborate in a pipeline, iterating until mission complete.

Planner(opus) → Generator(sonnet) → Evaluator(opus) → [feedback] → Planner → ...

Termination: Evaluator issues PASS verdict or max rounds reached.
Default mode: autopilot (auto-determines rounds and agent counts based on task size).

Execution Flow

Round 1 (Initial)

Mission
  │
  ▼
┌──────────┐
│ Planner  │  Analyze mission → Break into tasks → Define acceptance criteria
└────┬─────┘
     │ Execution plan
     ▼
┌──────────┐
│Generator │  Write code/config/docs per plan
└────┬─────┘
     │ Output
     ▼
┌──────────┐
│Evaluator │  Verify against criteria → Score
└────┬─────┘
     │
     ▼
  PASS(8+) → Done
  FAIL(<8) → Round 2

Round 2+ (Iteration)

Evaluator feedback
  │
  ▼
┌──────────┐
│ Planner  │  Integrate feedback → Revise plan
└────┬─────┘
     │ Revised plan
     ▼
┌──────────┐
│Generator │  Execute improvements
└────┬─────┘
     │ Improved output
     ▼
┌──────────┐
│Evaluator │  Re-verify → Re-score
└────┬─────┘
     │
     ▼
  PASS → Done  /  FAIL → Round 3...

When to Use

Good Fit

  • Complex feature implementation (many files changed)
  • Quality-critical tasks (security, performance)
  • Ambiguous requirements needing iterative refinement
  • Tasks involving architectural changes

Not a Good Fit

  • Simple bug fixes (1-2 files)
  • Configuration value changes
  • Documentation edits
  • Small tasks with clear instructions

Comparison with Other Patterns

| Aspect | Single Agent | Ralph Loop | Pipeline | Harness | |--------|-------------|-----------|----------|-------------| | Agent count | 1 | 1 (+verifier) | N (chain) | 3 (fixed roles) | | Feedback loop | None | Verifier→retry | None | Evaluator→Planner | | Plan revision | None | Implicit | None | Explicit re-planning | | Role separation | None | Partial | Per-stage | Strict separation | | Quality convergence | Low | Medium | Low | High | | Auto-scaling | No | No | No | Yes (autopilot) |

File Structure

harness/
├── README.md                              # This document
├── LICENSE                                # MIT License
├── install.sh                             # Installation script
├── .claude-plugin/
│   ├── marketplace.json                   # Claude Code marketplace manifest
│   └── plugin.json                        # Plugin metadata
├── .claude/
│   ├── commands/
│   │   └── harness.md                     # /harness command (orchestrator)
│   └── skills/
│       ├── harness-planner/SKILL.md       # Planner agent skill
│       ├── harness-generator/SKILL.md     # Generator agent skill
│       └── harness-evaluator/SKILL.md     # Evaluator agent skill
├── docs/
│   └── harness-engineering.md             # Detailed design document
└── examples/
    └── example-session.md                 # Example session walkthrough

License

MIT

Related plugins

Browse all →