Remote OpenClaw
Menu
SkillsMCPPluginsGuideAgentsAdvertise
Remote OpenClaw
SkillsMCPPluginsGuideAgentsAdvertise
Skills/vercel-labs/vercel-plugin/vercel-plugin-eval

vercel-plugin-eval

vercel-labs/vercel-plugin
767 installs205 stars

Installation

npx skills add https://github.com/vercel-labs/vercel-plugin --skill vercel-plugin-eval

Summary

Run live eval sessions against the vercel-plugin to verify hook behavior, skill injection, dedup correctness, and coverage. Launches real Claude Code sessions via WezTerm, monitors debug logs, and produces a structured coverage report.

SKILL.md

Plugin Eval

Launch real Claude Code sessions with the plugin installed, monitor debug logs in real-time, and verify every hook fires correctly with proper dedup.

DO NOT (Hard Rules)

  • DO NOT use claude --print or -p — hooks don't fire, no files created
  • DO NOT use --dangerously-skip-permissions
  • DO NOT create projects in /tmp/ — always use ~/dev/vercel-plugin-testing/
  • DO NOT manually wire hooks or create settings.local.json — use npx add-plugin
  • DO NOT set CLAUDE_PLUGIN_ROOT manually
  • DO NOT use bash -c in WezTerm — use /bin/zsh -ic
  • DO NOT use full path to claude — use the x alias
  • DO NOT write eval scripts — do everything as Bash tool calls in the conversation

Copy the exact commands below. Do not improvise.

Quick Start

Always append a timestamp to directory names so reruns don't overwrite old projects:

# 1. Create test dir & install plugin (with timestamp)
TS=$(date +%Y%m%d-%H%M)
SLUG="my-eval-$TS"
mkdir -p ~/dev/vercel-plugin-testing/$SLUG
cd ~/dev/vercel-plugin-testing/$SLUG
npx add-plugin https://github.com/vercel/vercel-plugin -s project -y

# 2. Launch session via WezTerm
wezterm cli spawn --cwd /Users/johnlindquist/dev/vercel-plugin-testing/$SLUG -- /bin/zsh -ic \
  "unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"

# 3. Find debug log (wait ~25s for session start)
find ~/.claude/debug -name "*.txt" -mmin -2 -exec grep -l "$SLUG" {} +

What to Monitor

Hook firing (all 8 registered hooks)

LOG=~/.claude/debug/<session-id>.txt

# SessionStart (3 hooks)
grep "SessionStart.*success" "$LOG"

# PreToolUse skill injection
grep -c "executePreToolHooks" "$LOG"        # total calls
grep -c "provided additionalContext" "$LOG"  # injections

# UserPromptSubmit
grep "UserPromptSubmit.*success" "$LOG"

# PostToolUse validate + shadcn font-fix
grep "posttooluse-validate.*provided" "$LOG"
grep "PostToolUse:Bash.*success" "$LOG"

# SessionEnd cleanup
grep "SessionEnd" "$LOG"

Dedup correctness (the key metric)

TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module)
CLAIMDIR="$TMPDIR/vercel-plugin-<session-id>-seen-skills.d"

# Claim files = one per skill, atomic O_EXCL
ls "$CLAIMDIR"

# Compare: injections should equal claims
inject_meta=$(grep -c "skillInjection:" "$LOG")
claims=$(ls "$CLAIMDIR" 2>/dev/null | wc -l | tr -d ' ')
echo "Injections: $((inject_meta / 3)) | Claims: $claims"

skillInjection: appears 3x per actual injection in the debug log (initial check, parsed, success). Divide by 3.

PostToolUse validate quality

Look for real catches — API key bypass, outdated models, wrong patterns:

grep "VALIDATION" "$LOG" | head -10

Scenario Design

Describe products and features, never name specific technologies. Let the plugin infer which skills to inject. Always end prompts with: "Link the project to my vercel-labs team so we can deploy it later. Skip any planning and just build it. Get the dev server running."

Coverage targets by scenario type

Scenario TypeSkills Exercised
AI chat appai-sdk, ai-gateway, nextjs, ai-elements
Durable workflowworkflow, ai-sdk, vercel-queues
Monorepoturborepo, turbopack, nextjs
Edge auth + routingrouting-middleware, auth, sign-in-with-vercel
Chat bot (multi-platform)chat-sdk, ai-sdk, vercel-storage
Feature flags + CRMvercel-flags, vercel-queues, ai-sdk
Email pipelineemail, satori, ai-sdk, vercel-storage
Marketplace/paymentspayments, marketplace, cms
Kitchen sinkmicro, ncc, all niche skills

Hard-to-trigger skills (8 of 44)

These need explicit technology references in the prompt because agents don't naturally reach for them:

  • ai-elements — say "use the AI Elements component registry"
  • v0-dev — say "generate components with v0"
  • vercel-firewall — say "use Vercel Firewall for rate limiting"
  • marketplace — say "publish to the Vercel Marketplace"
  • geist — say "install the geist font package"
  • json-render — name files components/chat-*.tsx

Coverage Report

Write results to .notes/COVERAGE.md with:

  1. Session index — slug, session ID, unique skills, dedup status
  2. Hook coverage matrix — which hooks fired in which sessions
  3. Skill injection table — which of the 44 skills triggered
  4. Dedup stats — injections vs claims per session
  5. Issues found — bugs, pattern gaps, validation findings

Cleanup

rm -rf ~/dev/vercel-plugin-testing

Featured

Deploy your OpenClaw free in 60 seconds logoDeploy your OpenClaw free in 60 seconds

Your own always-on OpenClaw agent, live in 60 seconds. No server, no setup — pick a model, connect Telegram, done.

Deploy now →
SetupClaw: done-for-you OpenClaw for founders & exec teams logoSetupClaw: done-for-you OpenClaw for founders & exec teams

White-glove OpenClaw for founders and exec teams (4–50+ employees): we install, harden, integrate your tools, and maintain it — secured from day one.

Get it set up for you →
One API to scrape, enrich, and extract the internet. logoOne API to scrape, enrich, and extract the internet.

Context.dev gives your agents a single API to scrape, enrich, and extract live web data — no proxies, no parsers, no maintenance.

Start building free →
CLN.Work — Stop prompting, start hiring AI employees logoCLN.Work — Stop prompting, start hiring AI employees

Turn your Claude agents into a real team — onboard them, assign tasks, and manage them like staff.

Hire AI employees →
Deploy your own AI agent logoDeploy your own AI agent

Launch OpenClaw or Hermes on Hostinger in about 60 seconds, keep your agent live 24/7, earn 20%-40% on your next referral up to $25-$45, and give your friend 20% off.

Launch on Hostinger →
Build the next $50K/mo OpenClaw wrapper logoBuild the next $50K/mo OpenClaw wrapper

Founders are earning with OpenClaw wrappers. Get the whole stack — auth, billing, deploy — and ship today, not in 3 months.

See the kit →

Categories

Command ExecutionExternal Downloads
View on GitHub

Recommended skills

Browse all →

vercel-react-best-practices

vercel-labs/agent-skills

523K installsInstall

vercel-composition-patterns

vercel-labs/agent-skills

235K installsInstall

vercel-react-native-skills

vercel-labs/agent-skills

157K installsInstall

find-skills

vercel-labs/skills

2.3M installsInstall

frontend-design

anthropics/skills

622K installsInstall

agent-browser

vercel-labs/agent-browser

509K installsInstall

Browse

Skills by category

Frontend250Git198Data154Testing120Design105Docs103Security96Automation87Backend76Devops37Productivity29Mcp23

Advertise on Remote OpenClaw

Get your AI tool in front of 67,000+ AI enthusiasts a month

See placements & pricing →

Remote OpenClaw

AI agent skills directory, marketplace, and workflow hub for OpenClaw, Hermes Agent, Claude Code, Codex, and MCP-powered operator stacks.

Explore

  • Home
  • Skills Directory
  • Claude Code Skills
  • Codex Skills
  • Marketplace
  • Hermes Ecosystem
  • Agents
  • Guide
  • Learn
  • Blog

More

  • Playbook
  • Free Tools
  • Shipping
  • Contact
  • Terms
  • Privacy
© 2026 Remote OpenClaw
Fazier badgeFeatured on Twelve ToolsFeatured on Wired BusinessRemote OpenClaw - Featured on AI Agents DirectoryListed on Turbo0