Skill Evaluator
Discover all skills in the project, score them across 6 criteria, and infer the appropriate `effort` level based on content analysis.
When to Use
- New project: run once to establish baseline quality
- Before committing a skill to a team repo
- After bulk-importing skills from another project
- When adding `effort` fields for the first time (v2.1.80+)
What Gets Audited
All `SKILL.md` files and flat `.md` files found in:
- `.claude/skills/**`
- `~/.claude/skills/**` (if requested)
- Any path passed as argument: `/eval-skills ./my-skills-dir`
---
Scoring Criteria (14 pts per skill)
| # | Criterion | Max | What is checked | |---|-----------|-----|-----------------| | 1 | **name** | 1 | Present, lowercase, hyphens only, matches directory name | | 2 | **description** | 2 | Present + has "Use when" / "when to" / trigger phrasing | | 3 | **allowed-tools** | 2 | Present + not overly broad (Bash without scoping when read-only) | | 4 | **effort** | 3 | Present (1pt) + appropriate for content (2pt based on inference) | | 5 | **content structure** | 4 | Has Purpose/When section (1), has examples/usage (1), has clear workflow (1), no placeholder text (1) | | 6 | **bonus** | +2 | argument-hint present (1), version/author metadata (1) |
> **Note**: `tags` is NOT an officially supported frontmatter field in Claude Code. It is ignored by the runtime. Do not include it or score it as a quality criterion.
**Thresholds:**
- ✅ Good: ≥11/14 (≥80%)
- ⚠️ Needs work: 8–10/14 (60–79%)
- ❌ Fix: <8/14 (<60%)
---
Effort Level Inference Engine
For each skill, analyze description + content and classify using these signals:
`low` — Mechanical execution, no design decisions
Signals:
- Verbs: commit, push, sync, scaffold, generate (template-based), format, rename, bump, wrap, convert
- No reasoning required: sequential steps, template instantiation, data fetching
- allowed-tools: Bash only, or Read-only
- No sub-agents spawned
- Short workflow (<5 steps)
Examples: `/commit`, `/release-notes`, `/scaffold`, `/sync`, `/format`
`medium` — Analysis with bounded scope, categorization
Signals:
- Verbs: review, triage, analyze, categorize, suggest, evaluate (single file or bounded scope)
- Requires pattern recognition but not architectural reasoning
- allowed-tools: Read + Grep + Bash combination
- May spawn 1-2 sub-agents but with predefined scope
- Produces structured output (tables, categorized lists)
Examples: `/code-review` (single PR), `/issue-triage`, `/dependency-audit`, `/test-coverage`
`high` — Design decisions, adversarial reasoning, cross-system analysis
Signals:
- Verbs: architect, redesign, threat-model, audit (security), orchestrate (multi-agent), score, assess trade-offs
- Requires reasoning about edge cases, attack vectors, or system-wide implications
- allowed-tools: broad access (Read + Write + Bash + external tools)
- Spawns multiple sub-agents or uses parallel execution
- Produces analysis with explicit uncertainty or trade-off sections
- Keywords in content: "security", "architecture", "adversarial", "pipeline", "threat", "design decision"
Examples: `/security-audit`, `/architecture-review`, `/cyber-defense`, `/eval-agents`
Mismatch flag
If a skill has `effort:` already set but the inferred level differs, flag it: > ⚠️ Effort mismatch: declared `low`, inferred `high` — skill spawns 4 sub-agents and performs security analysis
---
Execution Instructions
Step 1 — Discovery
# Find all SKILL.md files
find .claude/skills -name "SKIL
<!-- truncated -->
