Remote OpenClaw
Menu
SkillsMCPPluginsGuideAgentsAdvertise
Remote OpenClaw
SkillsMCPPluginsGuideAgentsAdvertise
Skills/vercel-labs/vercel-plugin/benchmark-e2e

benchmark-e2e

vercel-labs/vercel-plugin
766 installs205 stars

Installation

npx skills add https://github.com/vercel-labs/vercel-plugin --skill benchmark-e2e

Summary

End-to-end benchmark suite for vercel-plugin. Runs realistic projects through skill injection, launches dev servers, verifies everything works, analyzes conversation logs, and produces an improvement report for overnight self-improvement loops.

SKILL.md

Benchmark E2E

Single-command pipeline that creates projects, exercises skill injection via claude --print, launches dev servers, verifies they work, analyzes conversation logs, and generates actionable improvement reports.

Quick Start

# Full suite (9 projects, ~2-3 hours)
bun run scripts/benchmark-e2e.ts

# Quick mode (first 3 projects, ~30-45 min)
bun run scripts/benchmark-e2e.ts --quick

Options:

FlagDescriptionDefault
--quickRun only first 3 projectsfalse
--base <path>Override base directory~/dev/vercel-plugin-testing
--timeout <ms>Per-project timeout (forwarded to runner)900000 (15 min)

Pipeline Stages

The orchestrator chains four stages sequentially, aborting on failure:

  1. runner — Creates test dirs, installs plugin, runs claude --print with VERCEL_PLUGIN_LOG_LEVEL=trace
  2. verify — Detects package manager, launches dev server, polls for 200 with non-empty HTML
  3. analyze — Matches JSONL sessions to projects via run-manifest.json, extracts metrics
  4. report — Generates report.md and report.json with scorecards and recommendations

Contracts

run-manifest.json

Written by the runner at <base>/results/run-manifest.json. Links all downstream stages to the same run.

interface BenchmarkRunManifest {
  runId: string;           // UUID for this pipeline run
  timestamp: string;       // ISO 8601
  baseDir: string;         // Absolute path to base directory
  projects: Array<{
    slug: string;          // e.g. "01-recipe-platform"
    cwd: string;           // Absolute path to project dir
    promptHash: string;    // SHA hash of the prompt text
    expectedSkills: string[];
  }>;
}

The analyzer and verifier read this manifest to correlate sessions precisely instead of guessing from directory listings.

events.jsonl

The orchestrator writes NDJSON events to <base>/results/events.jsonl tracking pipeline lifecycle:

// Each line is one JSON object:
{ "stage": "pipeline", "event": "start", "timestamp": "...", "data": { "baseDir": "...", "quick": false } }
{ "stage": "runner",   "event": "start", "timestamp": "...", "data": { "script": "...", "args": [...] } }
{ "stage": "runner",   "event": "complete", "timestamp": "...", "data": { "exitCode": 0, "durationMs": 120000 } }
// On failure:
{ "stage": "verify",   "event": "error", "timestamp": "...", "data": { "exitCode": 1, "durationMs": 5000, "slug": "04-conference-tickets" } }
{ "stage": "pipeline", "event": "abort", "timestamp": "...", "data": { "failedStage": "verify", "exitCode": 1, "slug": "04-conference-tickets" } }

report.json

Machine-readable report at <base>/results/report.json for programmatic consumption:

interface ReportJson {
  runId: string | null;
  timestamp: string;
  verdict: "pass" | "partial" | "fail";
  gaps: Array<{
    slug: string;
    expected: string[];
    actual: string[];
    missing: string[];
  }>;
  recommendations: string[];
  suggestedPatterns: Array<{
    skill: string;   // Skill that was expected but not injected
    glob: string;    // Suggested pathPattern glob
    tool: string;    // Tool name that should trigger injection
  }>;
}

Overnight Automation Loop

Run the pipeline repeatedly with a cooldown between iterations:

while true; do
  bun run scripts/benchmark-e2e.ts
  sleep 3600
done

Each run produces timestamped report.json and report.md files. Compare across runs to track improvement.

Self-Improvement Cycle

The pipeline enables a closed feedback loop:

  1. Run — bun run scripts/benchmark-e2e.ts exercises the plugin against realistic projects
  2. Read gaps — report.json lists which skills were expected but never injected, with exact slugs
  3. Apply fixes — Use suggestedPatterns entries (copy-pasteable YAML) to add missing frontmatter patterns; use recommendations to fix hook logic
  4. Re-run — Execute the pipeline again to verify the gaps are closed
  5. Compare — Diff report.json across runs: verdict should trend from "fail" → "partial" → "pass"

For overnight automation, combine with the loop above. Wake up to reports showing exactly what improved and what still needs work.

Prompt Table

Prompts never name specific technologies — they describe the product and features, letting the plugin infer which skills to inject.

#SlugExpected Skills
01recipe-platformauth, vercel-storage, nextjs
02trivia-gamevercel-storage, nextjs
03code-review-botai-sdk, nextjs
04conference-ticketspayments, email, auth
05content-aggregatorcron-jobs, ai-sdk
06finance-trackercron-jobs, email
07multi-tenant-blogrouting-middleware, cms, auth
08status-pagecron-jobs, vercel-storage, observability
09dog-walking-saaspayments, auth, vercel-storage, env-vars

Cleanup

rm -rf ~/dev/vercel-plugin-testing

Featured

Deploy your OpenClaw free in 60 seconds logoDeploy your OpenClaw free in 60 seconds

Your own always-on OpenClaw agent, live in 60 seconds. No server, no setup — pick a model, connect Telegram, done.

Deploy now →
SetupClaw: done-for-you OpenClaw for founders & exec teams logoSetupClaw: done-for-you OpenClaw for founders & exec teams

White-glove OpenClaw for founders and exec teams (4–50+ employees): we install, harden, integrate your tools, and maintain it — secured from day one.

Get it set up for you →
One API to scrape, enrich, and extract the internet. logoOne API to scrape, enrich, and extract the internet.

Context.dev gives your agents a single API to scrape, enrich, and extract live web data — no proxies, no parsers, no maintenance.

Start building free →
CLN.Work — Stop prompting, start hiring AI employees logoCLN.Work — Stop prompting, start hiring AI employees

Turn your Claude agents into a real team — onboard them, assign tasks, and manage them like staff.

Hire AI employees →
Deploy your own AI agent logoDeploy your own AI agent

Launch OpenClaw or Hermes on Hostinger in about 60 seconds, keep your agent live 24/7, earn 20%-40% on your next referral up to $25-$45, and give your friend 20% off.

Launch on Hostinger →
Build the next $50K/mo OpenClaw wrapper logoBuild the next $50K/mo OpenClaw wrapper

Founders are earning with OpenClaw wrappers. Get the whole stack — auth, billing, deploy — and ship today, not in 3 months.

See the kit →

Categories

Command ExecutionExternal Downloads
View on GitHub

Recommended skills

Browse all →

find-skills

vercel-labs/skills

2.3M installsInstall

frontend-design

anthropics/skills

622K installsInstall

vercel-react-best-practices

vercel-labs/agent-skills

523K installsInstall

agent-browser

vercel-labs/agent-browser

509K installsInstall

grill-me

mattpocock/skills

448K installsInstall

web-design-guidelines

vercel-labs/agent-skills

436K installsInstall

Browse

Skills by category

Frontend250Git198Data154Testing120Design105Docs103Security96Automation87Backend76Devops37Productivity29Mcp23

Advertise on Remote OpenClaw

Get your AI tool in front of 67,000+ AI enthusiasts a month

See placements & pricing →

Remote OpenClaw

AI agent skills directory, marketplace, and workflow hub for OpenClaw, Hermes Agent, Claude Code, Codex, and MCP-powered operator stacks.

Explore

  • Home
  • Skills Directory
  • Claude Code Skills
  • Codex Skills
  • Marketplace
  • Hermes Ecosystem
  • Agents
  • Guide
  • Learn
  • Blog

More

  • Playbook
  • Free Tools
  • Shipping
  • Contact
  • Terms
  • Privacy
© 2026 Remote OpenClaw
Fazier badgeFeatured on Twelve ToolsFeatured on Wired BusinessRemote OpenClaw - Featured on AI Agents DirectoryListed on Turbo0