Featured

Launch OpenClaw on Hostinger in about 60 seconds and keep your agent live 24/7. Our referral link gives you 20% off, no coupon code needed.

Launch on Hostinger →

Run your Hermes agent on Hostinger, fully managed

Launch Hermes on Hostinger in one click, fully managed, no VPS knowledge needed. Use code ZACAARON10 for 10% off.

Launch on Hostinger →

Turn any website into LLM-ready data with Firecrawl

Firecrawl crawls and scrapes any site into clean markdown for your agent. Get 1,000 free credits plus 10% off through our link.

Try Firecrawl free →

Your own AI agent, running 24/7 with QwikClaw

QwikClaw sets up and runs an always-on OpenClaw agent for you. One click, no config files, no server setup.

Deploy now →

One API to scrape, enrich, and extract the internet.

Context.dev gives your agents a single API to scrape, enrich, and extract live web data — no proxies, no parsers, no maintenance.

Start building free →

Deploy OpenClaw in 60 seconds — 20% off

Launch OpenClaw on Hostinger in about 60 seconds and keep your agent live 24/7. Our referral link gives you 20% off, no coupon code needed.

Launch on Hostinger →

Run your Hermes agent on Hostinger, fully managed

Launch Hermes on Hostinger in one click, fully managed, no VPS knowledge needed. Use code ZACAARON10 for 10% off.

Launch on Hostinger →

Turn any website into LLM-ready data with Firecrawl

Firecrawl crawls and scrapes any site into clean markdown for your agent. Get 1,000 free credits plus 10% off through our link.

Try Firecrawl free →

Your own AI agent, running 24/7 with QwikClaw

QwikClaw sets up and runs an always-on OpenClaw agent for you. One click, no config files, no server setup.

Deploy now →

One API to scrape, enrich, and extract the internet.

Context.dev gives your agents a single API to scrape, enrich, and extract live web data — no proxies, no parsers, no maintenance.

Start building free →

agent-eval

colbymchenry/codegraph

741 installs49K stars

Run it on Hostinger →up to 70% off + an extra 10% with code ZACAARON10 Free API →

Installation

npx skills add https://github.com/colbymchenry/codegraph --skill agent-eval

Summary

Benchmark CodeGraph retrieval quality on a real codebase by comparing agent behavior with vs without CodeGraph. Use when the user runs /agent-eval or asks to test, benchmark, audit, or validate a codegraph version (the local dev build or a published npm version) against a language's repo.

SKILL.md

CodeGraph Quality Audit

Measures how much CodeGraph helps an agent versus plain grep/read, for a chosen codegraph version on a chosen real-world repo. Drives the harness in scripts/agent-eval/.

Prerequisites

tmux 3+, a logged-in claude CLI, node, git (macOS/Linux).
Run from the codegraph repo root.

Workflow

Copy this checklist:

- [ ] 1. Pick version (local or npm)
- [ ] 2. Pick language
- [ ] 3. Pick repo by size
- [ ] 4. Pick harness (headless / tmux / both)
- [ ] 5. Run audit.sh in the background
- [ ] 6. Report results

Step 1 — version. Ask with AskUserQuestion: which codegraph version to test. Offer "Local dev build" and "Latest published"; the free-text "Other" lets the user type a specific version (e.g. 0.7.10). Map the answer to a VERSION token:

"Local dev build" → local
"Latest published" → latest
a typed version → that string (e.g. 0.7.10)

Step 2 — language. Read .claude/skills/agent-eval/corpus.json. Ask with AskUserQuestion which language to test, listing the languages that have entries.

Step 3 — repo. From the chosen language's entries, ask which repo. Label each option with its size and file count, e.g. excalidraw — Medium (~600 files). Each entry carries the repo URL and a representative question.

Step 4 — harness. Ask with AskUserQuestion which harness to run, and map the answer to a MODE token:

"Headless" → headless — claude -p with stream-json: exact tokens/cost and a

clean tool sequence (2 runs, fast, no TTY).

"Interactive (tmux)" → tmux — drives the real Claude TUI in tmux: faithful

Explore-subagent behavior, metrics from session logs (2 runs, slower).

"Both" → all — headless + interactive (4 runs).

Step 5 — run. Launch in the background (sets the version, clones if missing, wipes + re-indexes, runs the chosen arms — several minutes):

scripts/agent-eval/audit.sh <VERSION> <repo-name> <repo-url> "<question>" <MODE>

Step 6 — report. When the job finishes, read the log and report per arm:

Headless (parse-run.mjs): total tool calls, file Reads, Grep/Bash,

codegraph-tool calls, duration, total cost.

Interactive (parse-session.mjs): the `VERDICT: codegraph_explore used Nx |

Read N | Grep/Bash N and TOKENS:` lines.

Lead with cost + tool/Read counts — they are the reliable signals; raw token in/out are confounded by subagent delegation and prompt caching. State whether codegraph reduced effort and whether both arms reached a correct answer.

Notes

The index is rebuilt every run (audit.sh wipes .codegraph) — different

versions extract differently, so an index must be served by the same binary that built it.

audit.sh temporarily mutates the global codegraph install for the test,

then restores your dev link via local-install.sh.

Corpus repos are cloned to /tmp/codegraph-corpus (reused if already present).
Add or edit repos in corpus.json (fields: name, repo, size, files,

question).

Score

0–100

65/ 100

Grade

Popularity17/30

741 installs — growing adoption. Source repo has 49,116 GitHub stars.

Completeness27/30

Documented: full SKILL.md body, description, one-line install. Missing: category/license metadata.

Trust15/25

Community skill with a public GitHub source repository you can review.

Freshness6/15

No update timestamp is tracked for this skill in our catalog.

Scored automatically from popularity, completeness, trust, and freshness — computed only from data in our catalog, never fabricated.

Proud of your score? Add this badge to your README.

Paste a snippet into your GitHub README. The badge updates automatically and links back to this page.

Score badge

Markdown

[![Agent Eval skill](https://www.remoteopenclaw.com/skills/colbymchenry/codegraph/agent-eval/badges/score.svg)](https://www.remoteopenclaw.com/skills/colbymchenry/codegraph/agent-eval)

HTML

<a href="https://www.remoteopenclaw.com/skills/colbymchenry/codegraph/agent-eval"><img src="https://www.remoteopenclaw.com/skills/colbymchenry/codegraph/agent-eval/badges/score.svg" alt="Agent Eval skill"/></a>