Academic Research Skills for Claude Code
   
A comprehensive suite of Claude Code skills for academic research, covering the full pipeline from research to publication.
Install in 30 seconds (Claude Code CLI / VS Code / JetBrains, v3.7.0+):
/plugin marketplace add Imbad0202/academic-research-skills
/plugin install academic-research-skills
Then try /ars-plan to walk through your paper structure via Socratic dialogue, or jump to Quick install for prerequisites and the traditional symlink flow.
> AI is your copilot, not the pilot. This tool won't write your paper for you. It handles the grunt work — hunting down references, formatting citations, verifying data, checking logical consistency — so you can focus on the parts that actually require your brain: defining the question, choosing the method, interpreting what the data means, and writing the sentence after "I argue that." > > Unlike a humanizer, this tool doesn't help you hide the fact that you used AI. It helps you write better. Style Calibration learns your voice from past work. Writing Quality Check catches the patterns that make prose feel machine-generated. The goal is quality, not cheating.
Why human-in-the-loop, not full automation?
Lu et al. (2026, Nature 651:914-919) built The AI Scientist — the first fully autonomous AI research system to publish a paper through blind peer review at a top-tier ML venue (ICLR 2025 workshop, score 6.33/10 vs workshop average 4.87). Their Limitations section enumerates the failure modes that any fully-autonomous AI research pipeline inherits: implementation bugs, hallucinated results, shortcut reliance, bug-as-insight reframing, methodology fabrication, frame-lock, citation hallucinations.
ARS is built on the premise that a human researcher augmented by AI avoids these failure modes better than either alone. Stage 2.5 and Stage 4.5 integrity gates run a 7-mode blocking checklist (see academic-pipeline/references/ai_research_failure_modes.md); the reviewer offers an opt-in calibration mode that measures its own FNR/FPR against a user-supplied gold set.
Zhao et al. (2026-05) audited 111M references across 2.5M papers on arXiv, bioRxiv, SSRN, and PMC. Their conservative estimate is 146,932 hallucinated citations for 2025 alone, with an observed mid-2024 inflection; for the bioRxiv-to-PMC pairing they report 85.3% preprint-to-published persistence. The paper describes "real citations deployed to support claims the cited references do not actually make" as an open challenge. ARS v3.7.1 added trust-chain frontmatter for source provenance; v3.7.3 added locator infrastructure (three-layer citation anchors) for future claim-level audits and surfaces advisory risk signals at cite time (ARS labels the claim-faithfulness gap internally as "L3"; this is ARS terminology, not the paper's). v3.7.x is motivated by Zhao et al.'s corpus-scale findings; corpus-scale evaluation of ARS itself remains future work.
v3.8 closes the second half of the L3 gap. v3.7.3 made every citation carry a locator anchor; v3.8 adds an opt-in audit pass (ARS_CLAIM_AUDIT=1) that fetches the cited source against each anchor and judges whether the claim is actually supported. Five new HIGH-WARN classes (claim-not-supported, negative-constraint-violation, fabricated-reference, anchorless, constraint-violation-uncited) gate-refuse output through the formatter terminal hard gate. Calibration is shipped as a 20-tuple gold set with FNR<0.15 + FPR<0.10 acceptance thresholds; ramp-on plan is deferred to post-calibration evidence per v3.8 spec §5.
v3.3 was inspired by PaperOrchestra (Song, Song, Pfister & Yoon, 2026, Google): Semantic Scholar API verification, anti-leakage protocol, VLM figure verification, and score trajectory tracking.
---
Architecture & pipeline
👉 docs/ARCHITECTURE.md — the full pipeline view: flow diagram, stage-by-stage matrix, data-access flow, skill dependency graph, quality gates, and mode list.
The architecture doc supersedes the sprawling pipeline description that used to live here. Everything about what runs in which stage now lives in one place.
Quick install
Prerequisites
- Claude Code (latest; plugin packaging requires recent versions)
ANTHROPIC_API_KEYexported, or set on firstclauderun- Optional: Pandoc for DOCX, tectonic + Source Han Serif TC for APA 7.0 PDF (Markdown output works without either)
- Optional (real Python): The core skills (research / write / review) need no Python — they are prompt-driven. A real Python interpreter is needed only for: the
PreToolUsewrite-scope guard (optional subagent hardening — if no real Python is found it cleanly no-ops and the guard is simply inactive; core skills are unaffected), plus a few opt-in features that shell out to Python (revision-patch mode, the submission-package verifier, and the/ars-cache-invalidate//ars-mark-read//ars-unmark-readcommands). On Windows, note thatpython3is often a non-functional Microsoft Store placeholder rather than real Python; install Python from python.org (or viawinget) so the launcher can find a real interpreter. The guard launcher is a POSIX shell script andhooks.jsoninvokes it throughbash, so on Windows it needs Git Bash (bundled with Git for Windows). With Git Bash present, a missing real Python degrades cleanly (the guard no-ops, silently). Without Git Bash, Claude Code falls back to PowerShell, which cannot run the.shlauncher at all: the guard is inactive and thePreToolUsehook will log an error per call rather than no-op quietly (accepted degradation — the guard is optional and never blocks your writes, but the hook noise is the trade-off until Git Bash is installed).
Plugin install (v3.7.0+, recommended):
/plugin marketplace add Imbad0202/academic-research-skills
/plugin install academic-research-skills
Verify it works: run /ars-plan and describe a paper you're working on — ARS will start a Socratic dialogue to map out chapter structure. For a single-shot test instead, try /ars-lit-review "your topic".
👉 docs/SETUP.md — full guide: install Claude Code, set up API keys, optional Pandoc/tectonic for DOCX/PDF, cross-model verification (ARS_CROSS_MODEL), and five installation methods (Plugin, project skills, global skills, claude.ai Project, repo-cloned).
Using Codex CLI? Install the sibling distribution instead: Imbad0202/academic-research-skills-codex — same workflow content, Codex-native packaging as a single $academic-research-suite skill with ars-* aliases.
Performance & cost
👉 docs/PERFORMANCE.md — per-mode token budgets, full-pipeline estimate (~$4–6 for a 15k-word paper), and recommended Claude Code settings (Auto mode; Agent Team optional).
Guides & articles
- Academic Writing Shouldn't Be a Solo Act — full pipeline walkthrough (English)
- 學術寫作不該是一個人的事:一套開源 AI 協作工具如何改變研究者的工作流 — 完整使用指南(繁體中文)
---
Features at a glance
- Deep Research — 13-agent research team with Socratic guided mode, PRISMA systematic review, intent detection, dialogue health monitoring, optional cross-model DA, Semantic Scholar API verification.
- Academic Paper — 12-agent paper writing with Style Calibration, Writing Quality Check, LaTeX hardening, visualization, revision coaching, citation conversion, anti-leakage protocol, and VLM figure verification.
- Academic Paper Reviewer — 7-agent multi-perspective peer review with 0–100 quality rubrics (EIC + 3 dynamic reviewers + Devil's Advocate), concession threshold protocol, attack intensity preservation, optional cross-model DA critique / calibration, R&R traceability matrix, read-only constraint.
- Academic Pipeline — 10-stage pipeline orchestrator with adaptive checkpoints, claim verification, Material Passport, optional
repro_lock, optional cross-model integrity verification, mid-conversation reinforcement, and score trajectory tracking. - Data Access Level Metadata (v3.3.2+) — every skill declares
data_access_level(raw/redacted/verified_only); enforced byscripts/check_data_access_level.py. Pattern adapted from Anthropic's automated-w2s-researcher (2026). Seeshared/ground_truth_isolation_pattern.md. - Task Type Annotation (v3.3.2+) — every skill declares
task_type(open-endedoroutcome-gradable). All current ARS skills areopen-ended. - Benchmark Report Schema (v3.3.5+) — JSON Schema + lint for honest benchmark comparisons. See
shared/benchmark_report_pattern.md. - Artifact Reproducibility Lockfile (v3.3.5+) — optional
repro_locksub-block on Material Passport. Configuration documentation, not replay guarantee — LLM outputs are not byte-reproducible. Seeshared/artifact_reproducibility_pattern.md. - Experiment Provenance Intake (#260) — optional
experiment_provenance[]on the Material Passport records experiments the scholar ran externally (ARS never runs experiments), and manuscript claims join to them viaclaim_intent_manifest.planned_experiment_ids[]. The integrity gate (Stage 2.5/4.5) audits each experiment-backed claim against declared provenance —ALIGNED/OVERSTATED/NOT_SUPPORTED_BY_PROVENANCE/PROVENANCE_INSUFFICIENT— without judging whether the experiment itself was correct. A fail-closedexperiment_intake_declarationmakes "did you run experiments?" an explicit Stage 1 decision (even literature-only runs declareno_experiments_declared). Seeshared/handoff_schemas.md§"Experiment Provenance Intake (#260)".
---
Showcase: real pipeline output
See the complete artifacts from a real 10-stage pipeline run — peer review reports, integrity verification reports, and the final paper:
Browse all pipeline artifacts →
| Artifact | Description | |---|---| | Final Paper (EN) | APA 7.0 formatted, LaTeX-compiled | | Final Paper (ZH) | Chinese version, APA 7.0 | | Integrity Report — Pre-Review | Stage 2.5: caught 15 fabricated refs + 3 statistical errors | | Integrity Report — Final | Stage 4.5: zero regressions confirmed | | Peer Review Round 1 | EIC + 3 Reviewers + Devil's Advocate | | Re-Review | Verification after revisions | | Peer Review Round 2 | Follow-up review | | Response to Reviewers | Point-by-point author response | | Post-Publication Audit Report | Independent full-reference audit: found 21/68 issues missed by 3 rounds of integrity checks |
---
Companion: Experiment Agent
If your research involves running experiments (code or human studies) before writing, the Experiment Agent skill fills the gap between ARS Stage 1 (RESEARCH) and Stage 2 (WRITE).
ARS Stage 1 RESEARCH → RQ Brief + Methodology Blueprint
↓
experiment-agent → run/manage experiments → validate results
↓
ARS Stage 2 WRITE → write paper with verified experiment results
What it does: executes code experiments (Python, R, etc.) with real-time monitoring, manages human study protocols with IRB ethics checklist, interprets statistics with 11-type fallacy detection, and verifies reproducibility.
How to use together: pause the ARS pipeline after Stage 1, run experiments in a separate experiment-agent session, then bring the results (with Material Passport) back to ARS Stage 2. ARS requires zero modification. See the experiment-agent README for setup instructions.
Stage 1 intake declaration (#260): at Stage 1, ARS detects whether the run will carry experiment-backed claims and sets a fail-closed experiment_intake_declaration on the Material Passport. If you ran experiments externally, the scholar enters one experiment_provenance[] entry per experiment (experiment_id, nested repro_lock, planned_vs_executed[], negative_results[], known_limitations[]) and the declaration is set to experiments_declared; if not, it is set to no_experiments_declared. The declaration is required on every post-#260 passport — a run that touches no experiments still declares no_experiments_declared, so the integrity gate can never be silently bypassed by a forgotten provenance block. The experiment_ids are frozen at this intake point; the writers later reference them via planned_experiment_ids[].
Teaching-side companion: Teaching Skills applies the ARS architecture (skill ensembles, shared contracts, staged gates, a Course Passport) to the teaching side of academic life — course design → lessons → assessment → delivery → reflection; its sotl mode hands classroom-inquiry projects off to ARS deep-research / academic-paper for the publication phase.
---
Usage
Quick Start
# Start a full research pipeline
You: "I want to write a research paper on AI's impact on higher education QA"
# Start with Socratic guidance
You: "Guide my research on AI in educational evaluation"
# Write a paper with guided planning
You: "Guide me through writing a paper on demographic decline"
# Review an existing paper
You: "Review this paper" (then provide the paper)
# Check pipeline status
You: "status"
Individual Skills
Deep Research (8 modes)
"Research the impact of AI on higher education" → full mode
"Give me a quick brief on X" → quick mode
"Do a systematic review on X with PRISMA" → systematic-review mode
"Guide my research on X" → socratic mode (guided)
"Fact-check these claims" → fact-check mode
"Do a literature review on X" → lit-review mode
"Compare these papers in WHY/HOW/WHAT format" → three-way-scan mode
"Review this paper's research quality" → review mode
Academic Paper (11 modes)
"Write a paper on X" → full mode
"Guide me through writing a paper" → plan mode (guided)
"Build a paper outline" → outline-only mode
"I have a draft, here are reviewer comments" → revision mode
"Parse these reviewer comments into a roadmap" → revision-coach mode
"Write an abstract for this paper" → abstract-only mode
"Turn this into a literature review paper" → lit-review mode
"Convert to LaTeX" / "Convert citations to IEEE" → format-convert mode
"Check citations" → citation-check mode
"Generate an AI disclosure statement for NeurIPS" → disclosure mode
"Audit my rebuttal draft against the reviews" → rebuttal-audit mode
Academic Paper Reviewer (6 modes)
"Review this paper" → full mode (EIC + R1/R2/R3 + Devil's Advocate)
"Quick assessment of this paper" → quick mode
"Guide me to improve this paper" → guided mode
"Check the methodology" → methodology-focus mode
"Verify the revisions" → re-review mode
"Calibrate this reviewer against my gold set" → calibration mode
Academic Pipeline (Orchestrator)
"I want to write a complete research paper" → full pipeline from Stage 1
"I already have a paper, review it" → mid-entry at Stage 2.5 (integrity first)
"I received reviewer comments" → mid-entry at Stage 4
> Pipeline ends with Stage 6: Process Summary — auto-generates a paper creation process record with 6-dimension Collaboration Quality Evaluation (1–100 scoring).
Supported Languages
- Traditional Chinese (繁體中文) — default when user writes in Chinese
- English — default when user writes in English
- Bilingual abstracts (Chinese + English) for academic papers
> Using a different language? Socratic mode (deep-research) and Plan mode (academic-paper) use intent-based activation — they detect the meaning of your request, not specific keywords. This means they work in any language without modification. > > However, the general Trigger Keywords section (which determines whether the skill is activated at all) still lists English and Traditional Chinese keywords. If you find the skill isn't activating reliably in your language, you can add your language's keywords to the ### Trigger Keywords section in each SKILL.md file to improve matching confidence.
Supported Citation Formats
- APA 7.0 (default, including Chinese citation rules)
- Chicago (Notes & Author-Date)
- MLA
- IEEE
- Vancouver
Supported Paper Structures
- IMRaD (empirical research)
- Thematic Literature Review
- Theoretical Analysis
- Case Study
- Policy Brief
- Conference Paper
---
Skill Details
Per-agent responsibilities and per-stage artifacts now live in docs/ARCHITECTURE.md. Version numbers are anchored here so release metadata stays in one place.
Deep Research (v2.11.0)
13-agent research team. Modes: full, quick, review, lit-review, three-way-scan, fact-check, socratic, systematic-review. Full agent roster and artifacts: see ARCHITECTURE.md §3.
Academic Paper (v3.2.0)
12-agent paper writing pipeline. Modes: full, plan, outline-only, revision, revision-coach, abstract-only, lit-review, format-convert, citation-check, disclosure, rebuttal-audit. Output: MD + DOCX (via Pandoc when available) + LaTeX (APA 7.0 apa7 class / IEEE / Chicago) → PDF via tectonic. Full agent roster and per-phase responsibilities: see ARCHITECTURE.md §3.
Academic Paper Reviewer (v1.10.0)
7-agent multi-perspective review with 0-100 quality rubrics. Modes: full, re-review, quick, methodology-focus, guided, calibration. Decision mapping: ≥80 Accept, 65-79 Minor Revision, 50-64 Major Revision, <50 Reject. First-round review team vs. narrow re-review team boundary: see ARCHITECTURE.md §3 Stage 3 / Stage 3'.
Academic Pipeline (v3.13.0)
10-stage orchestrator with integrity verification, two-stage review, Socratic coaching, and collaboration evaluation. Pipeline guarantees: every stage requires user confirmation checkpoint; integrity verification (Stage 2.5 + 4.5) cannot be skipped; R&R Traceability Matrix (Schema 11) independently verifies author revision claims. v3.4 added the Compliance Agent (PRISMA-trAIce + RAISE) at Stage 2.5 / 4.5. v3.5 adds the Collaboration Depth Observer (collaboration_depth_agent, advisory only — never blocks) at every FULL/SLIM checkpoint and at pipeline completion. MANDATORY integrity gates (2.5 / 4.5) explicitly skip the observer so compliance checks are not diluted. Based on Wang & Zhang (2026), IJETHE 23:11. Stage-by-stage matrix with agents, artifacts, and gates: see ARCHITECTURE.md §3.
---
v3.0 Optimizations: What We Discovered About AI's Structural Limits
What happened
While using ARS to write a reflection article about AI in higher education, I ran into three structural problems that no amount of prompt engineering could fix:
1. Frame-lock: I asked the AI to run a devil's advocate debate against its own thesis. It did — four rounds, each more refined than the last. But every round stayed inside the frame I'd set. The DA attacked arguments, never premises. It never asked "are we even discussing the right question?" This is the same pattern that caused the 31% citation error rate in v2.7's stress test: the verifying AI and the generating AI share the same cognitive frame.
2. Sycophancy under pushback: Every time I challenged the DA's attacks, it conceded too quickly. It retracted findings faster than it launched them. The model's training rewards conversational harmony — so "the user pushed back" was treated as evidence that the attack was wrong, when often it just meant the user was persistent.
3. Intent misdetection: The Socratic Mentor kept trying to converge and produce deliverables ("Want me to write this up?") when I was still exploring. It couldn't distinguish "the user wants a deep philosophical discussion" from "the user wants an RQ brief." Both look like engagement, but they need opposite AI behaviors.
What we changed (v3.0)
Devil's Advocate — Concession Threshold Protocol (deep-research + academic-paper-reviewer)
- DA must now score every rebuttal on a 1-5 scale before responding
- Concession only allowed at score ≥4 (rebuttal directly addresses core attack with evidence)
- Score ≤3: hold position and restate the original attack
- Anti-sycophancy rules: no consecutive concessions, concession rate tracking, frame-lock detection after each checkpoint
Socratic Mentor — Intent Detection Layer (deep-research)
- Classifies user intent as exploratory vs. goal-oriented at dialogue start and every 3 turns
- Exploratory mode: disables auto-convergence, raises max rounds to 60, prohibits "want me to summarize?" prompts
- Goal-oriented mode: standard convergence behavior
- Anti-premature-closure rules: in exploratory mode, the user decides when to stop
Socratic Mentor — Dialogue Health Indicator (deep-research)
- Silent self-assessment every 5 turns on three dimensions: persistent agreement, conflict avoidance, premature convergence
- Auto-injects challenging questions when agreement pattern detected
- Invisible to user (to prevent gaming), but log available for post-session review
Why this matters
These optimizations don't solve AI's structural limits — they make the limits visible and manageable. The DA will still eventually concede if pushed hard enough. The Socratic Mentor will still have some convergence bias. But now there are explicit checkpoints that slow down the sycophancy, force the DA to justify concessions, and prevent the Mentor from wrapping up before the user is ready.
The deeper lesson: AI literacy isn't about learning to use AI as a tool, following ethics rules, or fearing AI risks. It's about engaging AI deeply enough to discover its structural limits yourself — and your own thinking limits in the process.
---
License
This work is licensed under CC-BY-NC 4.0.
You are free to:
- Share — copy and redistribute the material
- Adapt — remix, transform, and build upon the material
Under the following terms:
- Attribution — You must give appropriate credit
- NonCommercial — You may not use the material for commercial purposes
Attribution format:
Based on Academic Research Skills by Cheng-I Wu
https://github.com/Imbad0202/academic-research-skills
---
Contributors
Cheng-I Wu (吳政宜) — Author and maintainer
aspi6246 — Contributor. The v3.1 optimization was inspired by patterns from Claude-Code-Skills-for-Academics: read-only constraint pattern, anti-pattern codification as first-class design, cognitive framework approach (teaching "how to think" not just procedures), and lean skill size philosophy.
mchesbro1 — Contributor. Originally proposed and drafted the IS Basket of 8 journals for academic-paper-reviewer/references/top_journals_by_field.md (Issue #5).
cloudenochcsis — Contributor. Extended the IS section from the Basket of 8 to the full Senior Scholars' Basket of 11 — adding Decision Support Systems, Information & Management, and Information and Organization (Issue #7, PR #8). Sourced from the AIS Senior Scholars' List of Premier Journals.
eltociear (Ikko Eltociear Ashimine) — Contributor. Translated the Japanese README (README.ja-JP.md) (PR #161).
xpfo-go (xpfo) — Contributor. Translated the Simplified Chinese README (README.zh-CN.md) (PR #181).
devCharlotte — Contributor. Translated the Korean README (README.ko-KR.md) (PR #469).
Yaobin29 — Contributor. Proposed reviewer-response tooling in PR #433; the deep-research three-way-scan mode and the academic-paper rebuttal-audit mode (rescued from the PR's audit concept) were integrated from that contribution in v3.12.1.
---
Changelog
v3.13.0 (2026-06-18) — Hook portability, provider-agnostic verification, guard correctness
> A minor release hardening the install/runtime surface and extending cross-model reach. Fixes: the write-scope guard no longer false-denies a user's own CLAUDE.md under the git-clone + symlink install layout (#459, closing the residual half of #448/#449 — CLAUDE.md is documentation, not a load-bearing enforcement file, so it leaves the infra-protected list while every load-bearing file stays protected); Windows Python hook portability + graceful no-Python degradation via a cross-platform hooks/run_guard.sh launcher that rejects the 0-byte Microsoft Store python3 stub and never spams the hook log (#454); draft_writer dual-phase static union documented + POSIX-safe Windows path matching (#451). Added: provider-agnostic cross-model verification accepting OpenAI-compatible endpoints (MiMo, DeepSeek, self-hosted) alongside grounded first-party OpenAI, which is never silently downgraded (#455); an opt-in Socratic adjacent-framing probe (STORM-borrowed perspective expansion, ARS_SOCRATIC_ADJACENT_PROBE=1, default OFF, prose-layer only — deep-research 2.10.0 → 2.11.0) (#461). academic-pipeline tracks the suite at v3.13.0; academic-paper and academic-paper-reviewer are unchanged. See CHANGELOG.md for the per-issue detail.
v3.12.1 (2026-06-15) — Reviewer-response triage modes (PR #433 integration)
> A patch release folding the genuinely-novel parts of an external contribution into existing skills as modes, per ARS's mode-based architecture. New modes: deep-research three-way-scan — a lightweight WHY/HOW/WHAT paper-comparison triage between quick and lit-review, with per-paper shortlists + a cross-paper synthesis (deep-research 2.9.4 → 2.10.0); academic-paper rebuttal-audit — standalone advisory QA of an author's existing rebuttal/response draft against the reviewer comments (per-comment coverage table + gap list + tone/evidence/misread risk flags), which generates nothing and explicitly suppresses Schema 11 / Material Passport writes / ready_to_submit when run standalone (enforced by a check_rebuttal_audit_guard() lint with mutation coverage); plus a revision-coach scope extension to pushback/disagreement posture and non-journal scopes, and /ars-3w + /ars-rebuttal-audit slash commands. Routed by input shape: reviewer comments AND a draft → rebuttal-audit; comments only → revision-coach. Integrated from @Yaobin29's PR #433. Suite mode count 25 → 27 (still 4 skills). See CHANGELOG.md for the per-issue detail.
v3.12.0 (2026-06-08) — Kong auto-research feature track: experiment provenance, figure fidelity, cross-paper contradiction, partial-evidence decomposition
> A minor release shipping the Kong et al. (2026, arXiv:2605.18661) auto-research feature track plus the partial-evidence-trap decomposition work, each reviewed and merged independently. New features: Experiment Provenance Intake + claim→experiment alignment — a schema-first evidence-ledger layer for experiment-backed claims, intake-and-alignment only (the scholar runs experiments externally; ARS never executes them) (#260); a Figure/Table Fidelity Gate that checks whether a caption's interpretation follows from the data and whether the manuscript cites the artifact for a claim it supports (#261); a structured Cross-Paper Contradiction inventory making assessed paper-pairs enumerable for scholar confirmation (#262); and sub-claim decomposition before judgment in both the citation judge (#213) and the editorial synthesizer (#214), closing the §F.3.2 partial-evidence trap on both layers. Guidance + interpretive layer: concise-output + pressure-stable boundary reinforcement across the report-producing reviewers (#274); a same-family / rubric-aware calibration epistemic note (#273); the retrieved-content instruction/data boundary stated as a standing principle (#367). Negative scope: the Kong META (#255) closed with a "Rejected mechanisms" section in POSITIONING.md enumerating the five autonomous mechanisms ARS does not do, plus two Tier D design-lesson docs. Release-discipline lint: version-consistency invariants 5–7 (#357) and ARCHITECTURE component-version policing (#345). Plus correctness fixes across the cross-model grounding guards (#346 / #349 / #351), the citation-gate cache key and rationale bounding (#359 / #360 / #361), the eval gold set (#250), and ACL/EMNLP disclosure regrounding (#242). The new schemas, manifest field, and all invariants are additive and backward-compatible. academic-pipeline tracks the suite at v3.12.0; the other three skill versions are unchanged. See CHANGELOG.md for the per-issue detail.
v3.11.1 (2026-06-06) — Post-ship correctness, hardening & provenance rollup
> A patch release rolling up the post-ship fixes surfaced after v3.11.0, each reviewed and merged independently: a cross-model consent-gate extension to the integrity-verification + collaboration-depth paths (#322), a per-entry OpenAlex + Crossref backfill parallelization (#138), and seven correctness/hardening fixes across the citation-existence gate, the v3.10 policy layer, the eval harness, the domain evidence profiles, and the #310 security-boundary edge cases (#323 / #327 / #328 / #329 / #331 / #332 / #333) — including two P1 fixes (#327 domain-profile activation on the no-handoff path, #328 the eval-harness per-class threshold gate). No new features and no breaking schema changes. See CHANGELOG.md for the per-issue detail.
v3.11.0 (2026-06-04) — Deterministic citation verification gate (#182)
> Adds a deterministic citation-existence verification gate that runs independently of LLM peer review. Every cited reference is cross-checked against up to four bibliographic indexes — Semantic Scholar + OpenAlex + Crossref + the new arXiv resolver (scripts/arxiv_client.py, no API key needed) — and a per-citation lookup_verified status ({true, false, unresolvable}) is written to a unified summary, so a fabricated citation with a provably-bogus DOI/arXiv ID is caught by lookup rather than by hoping a reviewer agent notices. The gate inherits the v3.10 terminal_policies opt-in model: detection always runs, but a lookup_verified == false row is terminal only when a user opts into terminal_policies.citation_existence == strict — default behavior is advisory and /ars-mark-read-acknowledgeable. false is narrowed to ID-keyed unmatched (an exact DOI/arXiv lookup that provably fails), so legitimately-unindexed humanities / non-English / regional citations stay unresolvable and never block (a documented precision-over-recall tradeoff). Ships a persistent SQLite verification cache (~/.cache/ars/verification.db, 90-day TTL) with an /ars-cache-invalidate command, a standalone verification_gate API + verify_passport.py CLI, and a four-index extension (k=0..4) of the v3.9.0 contamination triangulation matrix (all advisory). academic-pipeline tracks the suite at v3.11.0; the other three skill versions are unchanged. Spec: docs/design/2026-05-21-v3.10-182-promote-citation-gate-spec.md (§0 amendment + C-V6).
v3.10.0 (2026-06-01) — Triangulation policy layer, Kong survey adoptions, eval harness, scoped-write guard
> Minor release bundling: the opt-in contamination-triangulation terminal policy layer (#127 — default citation behavior byte-equivalent to v3.9.0); Kong et al. 2026 survey adoptions — the Rebuttal Commitment Ledger (#256/#266/#268/#269) and discipline-relative domain evidence profiles (#259); the v3.10 measurement infrastructure — a generalized eval gold set + ranking-lift CI gate (#184); the scoped-write guard MVP (#134) — a deterministic PreToolUse hook that fences the 23 single-phase agents to their own phase directory and denies them Bash (they use the Grep/Glob and structured editing tools instead); the /ars-mark-read plugin commands (#190) plus a broken-on-arrival fix (#195); a Simplified-Chinese README (#185); and CI hardening (#156/#155). academic-paper → v3.2.0 and academic-paper-reviewer → v1.10.0 for the Commitment-Ledger and domain-profile additions; academic-pipeline tracks the suite at v3.10.0. Default skill behavior is unchanged unless a strict policy mode is opted into; the one default-on change is the #134 guard, which constrains the fenced subagents, not user-facing outputs.
v3.9.4.2 (2026-05-19) — post-ship hotfix for PR #149 CI discipline gates (codex post-ship)
> Codex post-ship review of PR #149 (7 CI discipline gates) surfaced 4 P2 findings; v3.9.4.2 hardens 3 of 4. F1: harness-retirement-monthly.yml adds GH_REPO so scheduled runs have repo context for gh issue create. F2: release-cooldown.yml filters PREV_TAG lookup to v* tags so non-release tags cannot bypass cooldown. F3: release-cooldown.yml also reads annotated tag subject + accepts hot-fix spelling (v3.9.2 was previously a false-negative hotfix). PR #157 follow-up: [skip-cooldown] override now read from both commit message AND annotated tag message (self-bootstrapping fix — this tag's cooldown bypass demonstrates F2+F3 work end-to-end). F4 (test-count-monotonic harden) reverted because it surfaced pre-existing scripts/ package issue, tracked as #154 (since fixed by PR #158) + re-attempt #155. Closes #152. Follow-ups: #155, #156.
v3.9.4.1 (2026-05-19) — post-ship hotfix for v3.9.4 temporal verification (#135 codex post-ship)
> Codex post-ship review of v3.9.4 caught 4 real bugs that per-task subagent reviewers missed. Hotfix patches all 4: (1) audit() now wires citation_provenance through to P2 and P4 — when a ref slug has confidence: low or conflict, the verifier emits TEMPORAL-METADATA-MISSING instead of using timeline dates as ground truth (spec §3.4 first-party safety check was broken). (2) _date_to_interval parses all schema-valid date shapes including YYYY-MM (Crossref month precision) and YYYY-MM-DD..YYYY-MM-DD (interval); v3.9.4 silently ValueError'd on these and skipped the check. (3) P4 now binds direct date captures when ref markers are absent — sentences like "The 2026 policy enabled the 2020 rollout" actually trigger now. (4) citation_provenance.schema.json confidence:high allOf now requires presence (then.required) in addition to non-null, closing the absent-property bypass. 1561 passed (+12 new tests vs v3.9.4 baseline, 0 regression). ARCHITECTURE.md aligned to current state (was stale at v3.8.0).
v3.9.4 (2026-05-18) — #135 temporal verification layer (advisory)
> Deterministic advisory verifier at the Phase 4 → 5 boundary covering 5 temporal failure modes (P1 retrospective arithmetic, P2 anachronistic citation, P3 comparator unmaterialized, P4 causal inversion, P5 deictic present). New Phase 2 sibling timeline_extraction_agent owns phase2_investigation/timeline.yaml + phase2_investigation/citation_provenance.yaml. Verifier script scripts/temporal_integrity_audit.py runs 5 passes deterministically. M3 Temporal Integrity Iron Rule added to report_compiler_agent + draft_writer_agent. M6-minimal: Crossref issued + pdftotext cover first-party verification. M7-minimal: date provenance + comparator materialization. M5-stub: user-declared version_family_id only. Zero modification to literature_corpus_entry, claim_audit_result, claim_intent_manifest. bibliography_agent unmodified (F2 invariant). 3 new sidecar schemas. Coverage estimate: 55-70% baseline / 65-75% with M7 minimal. 1549 passed (+44 new, 0 regression).
v3.9.3 (2026-05-18) — #128 housekeeping (shared client utilities + dedup resolvers)
> Pure refactor + one latent-bug fix from the v3.9.0 /simplify review backlog. Extracts scripts/_text_similarity.py (3-way client dedup: normalize / similarity / threshold / retry constants) + scripts/_passport_yaml.py (2-way migration tool dedup: ruamel.yaml round-trip config) + private _resolve_by_doi_then_title helper (2-way resolver body dedup, §3.4 / §3.5 API surface preserved). Standardizes throttle measurement on time.monotonic across OpenAlex + Crossref (was time.time, NTP-unsafe), aligning with Semantic Scholar. Dual-path import infrastructure on all 5 module-level cross-imports (sibling-first, namespace-package fallback) preserves class identity for SemanticScholarUnavailable and bonus-fixes 2 latent-broken import scripts.X paths. 1505 passed (+23 new, 0 regression). #128 §4 (parallelize OA + CR per-entry) carried to #138.
v3.9.2 (2026-05-18) — #133 phase boundary hot-fix
> #133 closure (hot-fix layer). Long-term architectural fix tracked as v3.10 active conductor in #134. Adds: routing clarification gate in CLAUDE.md (cross-phase materials → clarify with a-d options, not silent dispatch), 22 single-phase agents get prompt hard fence (## Phase Boundary (v3.9.2)), 16 multi-phase / phase-orthogonal / cross-phase-meta agents intentionally NOT fenced (honest framing — prose placebo creates false-enforcement illusion), advisory verifier scripts/check_pipeline_integrity.py detects #133 pattern post-hoc. Behavioral smoke tests with cross-model spot-check (100% Opus 4.7, ≥75% Sonnet + GPT-5.5).
v3.9.1 (2026-05-18) — #129 + #130 client hardening
> v3.9.0 hot-fix. Wraps OpenAlex / Crossref response-read failures as *Unavailable (#129); guards check_claim_audit_consistency against non-string manifest_id (#130). No spec change.
v3.9.0 (2026-05-17) — #102 cross-index triangulation measurement
> #102 closure. v3.7.3 shipped single-index (Semantic Scholar) contamination detection; v3.9.0 extends to three-index triangulation (S2 + OpenAlex + Crossref) as advisory evidence only. Two new optional booleans (openalex_unmatched, crossref_unmatched) on contamination_signals; manual-entry not-rule extended symmetrically. Finalizer adds a 4-tier advisory matrix (k=0/1/2/3 over present *_unmatched fields) with v3.7.3 legacy CONTAMINATED-UNMATCHED preserved for the k=1/k_max=1 S2-only case. Formatter pass-through allowlist extends 3 → 9 suffixes; refusal rules 1-10 unchanged per R-L3-2-E. The policy layer (strict modes, hard-block tier, venue_type / triangulation_policy) is deferred to v3.10 per spec §2.3. k=3 marker is CONTAMINATED-TRIANGULATION-UNMATCHED (describes observable, not inferred cause). 3 new firm rules: R-L3-2-C (k computed over present fields), R-L3-2-D (no API-inferred classification), R-L3-2-E (refusal list unchanged; pass-through allowlist extends).
Migration: v3.7.3 corpora — run python scripts/migrate_literature_corpus_to_v3_9_0.py PATH to backfill the two new fields. Pre-v3.7.3 corpora — run migrate_literature_corpus_to_v3_7_3.py FIRST, then v3.9.0 migration (daisy-chained per spec §3.7; the v3.9.0 tool only acts on entries that already carry contamination_signals.semantic_scholar_unmatched).
v3.8.2 (2026-05-17) — #118 uncited audit_tool_failure surface
> #118 closure. The ARS_CLAIM_AUDIT=1 uncited constraint-judging path used to silently substitute {"judgment": "NOT_VIOLATED"} on JudgeInvocationError, suppressing HIGH-WARN constraint checks on transient judge outage. v3.8.2 routes those failures through a dedicated uncited_audit_failures[] aggregate at MED-WARN advisory tier, mirroring the cited path INV-14 row but using a dedicated schema because claim_audit_result.ref_slug is required and the uncited path has no ref to bind. The four option-1..4 trade-offs from the #118 issue body landed on option 2 (new aggregate) — option 4 (re-raise and abort) was rejected for the audit-coverage hit on flaky judge endpoints.
- New
uncited_audit_failure.schema.jsonaggregate (spec §3.6). One entry per uncited sentence × manifest pair where the constraint judge raisedJudgeInvocationError. Same fault-class enum as cited-path INV-14 (judge_timeout/judge_api_error/judge_parse_error/cache_corruption/retrieval_api_error/retrieval_timeout/retrieval_network_error).rule_version: D4-c-v1-uaf-v1. - UAF-INV-1..UAF-INV-6 lint (spec §6 rule 4d).
finding_iduniqueness, scoped_manifest_id cross-array integrity, (M, C) pair integrity when manifest_claim_id non-null, per-(sentence, manifest) dedup, rationale fault_class prefix, cross-aggregate exclusivity vsconstraint_violations[]. - Finalizer §5 MED-WARN advisory row: annotation
[CLAIM-AUDIT-TOOL-FAILURE-UNCITED — <fault-class>], gate passes (retry-next-pass remediation). Formatter REFUSE list unchanged — UAF is advisory. - Pipeline integration (
scripts/claim_audit_pipeline.py): swallow site at line 1211-1224 removed;JudgeInvocationErrornow emits a UAF row +continues to the next (sentence, manifest) pair. No fake NOT_VIOLATED reachesconstraint_violations[]. - Tests: 18 new (15 schema/lint TSUAFUncitedAuditFailureInvariants + 3 pipeline integration TP23UncitedJudgeOutageEmitsUAF). Baseline 694 → 712 tests, 0 regression.
- Agent doc (
academic-pipeline/agents/claim_ref_alignment_audit_agent.md): Output emission table grows seventh row; Error handling table grows from 3 surfaces to 4 surfaces with the uncited-path UAF row.
v3.8.0 (2026-05-16) — L3 Claim-Faithfulness Locator + Audit (paired milestone)
> v3.7.3 + v3.8 close the L3 (claim-faithfulness) gap end-to-end. v3.7.3 ships the locator infrastructure — every citation carries a three-layer anchor so future audits can fetch the cited passage. v3.8 ships the audit pass that consumes those anchors, judges whether the cited source supports the claim, and gate-refuses HIGH-WARN violations at the formatter terminal hard gate. The release also bundles 5 audit-trail-shipped feature PRs accumulated since v3.7.0 (#104 / #105 / #108 / #111 / #115).
- #103 —
claim_ref_alignment_audit_agent(v3.8 PR #121). Opt-in (ARS_CLAIM_AUDIT=1, default OFF) Stage 4→5 audit agent. Judges every sampled citation against retrieved excerpt; emitsclaim_audit_results[]+claim_intent_manifests[]+claim_drifts[]+uncited_assertions[]+constraint_violations[]aggregates. 8-row finalizer matrix routes HIGH-WARN classes (CLAIM-NOT-SUPPORTED / NEGATIVE-CONSTRAINT-VIOLATION / FABRICATED-REFERENCE / ANCHORLESS / CONSTRAINT-VIOLATION-UNCITED) through the formatter REFUSE rules 6-10. Calibration runner ships with 20-tuple gold set (T-C1 FNR<0.15 + FPR<0.10, T-C2 per-class, T-C3 shape integrity). 8 rounds of dual-track review (R1 codex + Gemini-3.1-pro-preview, R2-R8 codex-only after Gemini quota exhausted); trajectory R1 4P1+2P2 → R8 0P1+4P2 ship gate. - v3.7.3 — Three-Layer Citation Emission + contamination signals (PR #98).
synthesis_agent/draft_writer_agent/report_compiler_agentgain## Three-Layer Citation Emission (v3.7.3)H2. Every<!--ref:slug-->carries<!--anchor:<kind>:<value>-->with<kind> ∈ {quote, page, section, paragraph, none}(quote anchors capped at 25 words, URL-encoded).pipeline_orchestrator_agentfinalizer becomes 5-cell with precedence-zero NO-LOCATOR check.formatter_agentadds explicit hard-gate refusal for[UNVERIFIED CITATION — NO QUOTE OR PAGE LOCATOR].literature_corpus_entry.schema.jsonadds optionalcontamination_signals: { preprint_post_llm_inflection, semantic_scholar_unmatched }object.bibliography_agentcomputes both signals at ingest. 11-round review trajectory (Codex×10 + Gemini cross-model×1) closed 22 findings. Spec:docs/design/2026-05-12-ars-v3.7.3-claim-faithfulness-and-contaminated-source-spec.md. External motivation: Zhao et al. arXiv:2605.07723 (2026-05). - #108 — AI disclosure policy-anchor renderer (audit-trail-shipped 2026-05-14). Adds PRISMA-trAIce / ICMJE / Nature / IEEE policy-anchor disclosure paths alongside the existing venue-track renderer.
- #111 —
slr_lineageemission on systematic-review → academic-paper handoff (2026-05-15). Schema 9 optional booleanslr_lineagefield; producerpipeline_orchestrator_agentwrites at every handoff transition; consumerdisclosuremode dispatches--policy-anchor=prisma-trAIceper the §4.3 G2 invariant track gate. - #104 — README motivation: Zhao et al. corpus-scale evidence anchor (2026-05-15). README +
README.zh-TW.mdmotivation section frames the v3.7.x line against Zhao et al.'s 146,932 hallucinated-citation finding. - #105 — v3.7.3 contamination_signals backfill migration tool (2026-05-15).
scripts/migrate_literature_corpus_to_v3_7_3.pyretro-computes both contamination signals across pre-v3.7.3 passports. - #115 — Semantic Scholar client maturity (2026-05-15).
scripts/semantic_scholar_client.pyadds 1-req/s throttle (drops to 0.1s whenS2_API_KEYdetected), outage latch on URLError, andreset_outage_latch()for long-running cross-passport batches.
v3.7.0 (2026-05-05) — Claude Code Plugin Packaging
> Plugin packaging upgrade: ARS now installs in one line on Claude Code CLI / VS Code / JetBrains via /plugin marketplace add Imbad0202/academic-research-skills + /plugin install academic-research-skills. The traditional git clone + symlink to ~/.claude/skills/ flow continues to work — both tracks are first-class.
- Plugin manifest + marketplace metadata (Phase 1, PR #68).
.claude-plugin/plugin.jsondeclares the suite (4 skills auto-discovered fromskills/directory via relative symlinks)..claude-plugin/marketplace.jsonregisters the plugin so a single GitHub-hosted endpoint serves both the marketplace listing and the plugin source. README +README.zh-TW.md+docs/SETUP.mdcarry dual-track install instructions. - 10 slash commands at
commands/ars-*.md(Phase 2.1, PR #69) mappingMODE_REGISTRY.mdentries to/ars-<mode>triggers. Model routing is pinned in each command's frontmatter —opusforfullandrevision-coach(architectural / review-interpretation depth),sonnetfor the other 8. No Haiku per project policy. - 3 plugin-shipped agents at
agents/*_agent.md(Phase 2.1, PR #69) as relative symlinks to the v3.6.7-hardened downstream agents indeep-research/agents/:synthesis_agent,research_architect_agent,report_compiler_agent. Underscore filenames preserved to keepscripts/check_v3_6_7_pattern_protection.pyhard-pinned paths and INV-3 manifest-confined Clause 1 invariant intact. Symlinks (not copies) preserve a single source of truth and prevent the Pattern C3 attack surface that v3.6.7 §6 inversion sweep + INV-1/2/3 lint closes. (Materialized to real byte-identical copies in #413 — relative symlinks break Windows checkouts withoutcore.symlinksand zip-download installs; the single-source guarantee moved to thescripts/check_agents_mirror_sync.pybyte-equality CI lint.) model: inheritadded to those three source agent frontmatters. Inherit chosen over pinningsonnetso an opus session running ARS full pipeline keeps opus agents (instead of being capped). The user's~/.claude/hooks/warn-agent-no-model.shPreToolUse hook gates Haiku at the dispatching boundary, soinheritresolves through an already-Haiku-free model.- SessionStart announce hook at
hooks/hooks.json+scripts/announce-ars-loaded.sh(Phase 2.2, PR #70). When the plugin loads, the hook injects anadditionalContextlisting the 10 slash commands, the 3 plugin agents, and a token-budget pointer into the LLM's first turn.startupandclearsource values get the full announce;resumeandcompactget a one-line ack to avoid burning context. Bash 3.2 compatible — runs on macOS stock/bin/bashwith nobrew install bashrequirement. - Phase 2.2 scope reduction: a
SubagentStop → run_codex_audit.shcodex audit hook was scoped out for v3.7.0 due to a contract gap (the SubagentStop payload carries no stage/deliverable info, so the wrapper would have to half-infer required arguments) and an invoker-class boundary (run_codex_audit.shlines 4–7 forbid same-session in-LLM invocation; PostToolUse fires inside the producing session). Real audit-hook integration deferred to a future release when ARS gains a stage/deliverable propagation contract. Seedocs/design/2026-04-30-ars-v3.7.0-plugin-packaging-roadmap.mdUpdate note 2026-05-05 (Phase 2.2 scope reduction). docs/PERFORMANCE.md+.zh-TW.mdgain a "v3.7.0 Plugin agents and model routing" subsection explaining the inherit semantics and current 3-agent scope boundary.- Codex review chain across the three PRs: 8 inline iterative rounds + 3 fresh PR-level rounds, all converging to 0 P0/P1/P2 findings before merge. The Phase 2.2 fresh PR review caught one P2 (unquoted
${CLAUDE_PLUGIN_ROOT}breaking install paths with spaces) that the inline rounds missed — confirms the value of separating implementation review (inline) from contract review (fresh). - What did NOT change: the four skill directories, all 25 modes, agent prompts, schema files, and lint contracts. Plugin packaging only adds new top-level surface (
commands/,agents/,hooks/,.claude-plugin/,skills/symlink dir, three plugin-agentmodel: inheritfrontmatter additions). Existing 4.3k clone-install users see no breaking change.
v3.6.8 (2026-05-03) — Generator-Evaluator Contract Gate (v3.6.6 spec ship)
> Naming note: this release ships the v3.6.6 generator-evaluator contract spec > and implementation. The v3.6.6 work landed after v3.6.7 due to project sequencing; > the design doc retains the v3.6.6 internal naming for the contract gate version, > while the suite release is tagged v3.6.8 to keep the CHANGELOG monotonic.
- Schema 13.1 (
shared/sprint_contract.schema.json) extends Schema 13 with two newmodeenum values (writer_full+evaluator_full), two new optional top-level fields (pre_commitment_artifactswriter-only,disagreement_handlingevaluator-only), and 12allOfbranches enforcing reviewer- / writer- / evaluator-conditional gates. Existing reviewer contracts validate byte-equivalent under Schema 13.1 (§3.6 zero-touch promise). - Two new shipped contract templates under
shared/contracts/writer/full.json(D1–D7, F1/F4/F2/F3/F0) andshared/contracts/evaluator/full.json(D1–D5, F1/F2/F3/F6/F4/F5/F0). Promoted from design-time artefacts on the spec branch to live shipped status atomically with the Schema 13.1 upgrade. - Two-phase orchestration inside
academic-paper full: Phase 4 splits into Phase 4a (writer paper-blind pre-commitment) + Phase 4b (writer paper-visible drafting + self-scoring); Phase 6 splits into Phase 6a (evaluator paper-blind pre-commitment) + Phase 6b (evaluator paper-visible scoring + decision). Phase-numbered<phase4a_output>/<phase6a_output>data delimiters mirror the v3.6.2 reviewer pattern. Lint count summary: writer 3+4 / evaluator 5+5 / reviewer 5+6 (reviewer remains zero-touch). academic-paperSKILL + agent files gain a verbatim## v3.6.6 Generator-Evaluator Contract Protocolblock (101 lines in SKILL.md plus 47 lines indraft_writer_agent.md+ 57 lines inpeer_reviewer_agent.md). SKILL.md also adds a new## Known limitationssection carrying graceful-degradation + cross-session resume forward notes for v3.6.7+.- Validator extensions:
scripts/check_sprint_contract.pySC-* mode-gating audit (SC-5 + SC-11 reviewer-only; SC-9 extended across all three mode families). 17 new tests bring the validator unit-test count from 54 to 71 (positive + 5 schema-branch negative + 2 §3.6 reviewer regression + 6 mode-gating tests). - Manifest CI lint:
scripts/check_v3_6_6_ab_manifest.pyenforces §6.2 manifest schema + §6.5 git-tracked invariants ontests/fixtures/v3.6.6-ab/manifest.yaml..github/workflows/spec-consistency.ymlextends the sprint contract validation loop to iterate writer + evaluator template directories alongside the existing reviewer loop, plus runs the new manifest CI lint. - A/B evidence fixture stub at
tests/fixtures/v3.6.6-ab/(30 files): manifest + README + 6 paper-A inputs/baseline + 1 paper-C inputs/baseline + Stage 3 reviewer excerpt + 6 codex-judge baseline placeholders. Real fixture data populates in follow-up commits before the implementation work fully completes.
v3.6.7 (2026-04-30) — Downstream-Agent Pattern Protection (Step 1+2)
- Three downstream agents hardened against 13 of 17 documented hallucination/drift patterns:
synthesis_agent(A1–A5 narrative-side), the survey-designer mode ofresearch_architect_agent(B1–B5 instrument-side), and the abstract-only mode ofreport_compiler_agent(C1–C3 publication-side). Each agent prompt now carries aPATTERN PROTECTION (v3.6.7)block. - Four reference files in
shared/references/:irb_terminology_glossary.md,psychometric_terminology_glossary.md,protected_hedging_phrases.md,word_count_conventions.md. The reference files carry operational contracts that the agent prompts cite by path. - Cross-model audit prompt template at
shared/templates/codex_audit_multifile_template.mdwith seven audit dimensions and a mandatory three-part Section 4(f) check forreport_compiler_agentbundles. Failure of any sub-check is a P1 finding. - Static lint + 29-test mutation suite:
scripts/check_v3_6_7_pattern_protection.pyenforces protection-clause presence and obligation-phrase shape;scripts/test_check_v3_6_7_pattern_protection.pypreserves codex review evidence so future checker regressions surface in CI. Both are wired into.github/workflows/spec-consistency.yml. - Codex review history: seven rounds of
gpt-5.5+xhighcross-model review reached SHIP-OK with zero P1+P2 findings. Step 6 (orchestrator runtime hooks) and Step 8 (synthetic eval case) ship in a follow-up PR.
v3.6.5 (2026-04-27) — Material Passport literature_corpus[] Consumer Integration
- Two Phase 1 literature consumers wired:
deep-research/agents/bibliography_agent.mdandacademic-paper/agents/literature_strategist_agent.md. Both follow the same five-step corpus-first, search-fills-gap flow when the passport carries a non-emptyliterature_corpus[]and the same four Iron Rules (Same criteria / No silent skip / No corpus mutation / Graceful fallback on parse failure). - PRE-SCREENED reproducibility block in Search Strategy reports: enumerates included / excluded / skipped corpus entries, with F3 zero-hit note and F4a–F4f provenance reporting that compose around partial declaration of
obtained_via/obtained_at.final_included = pre_screened_included[] ∪ external_included[]stays neutral — no provenance tags on bibliography entries or literature matrix rows. - Consumer protocol reference at
academic-pipeline/references/literature_corpus_consumers.mdwith the canonical PRE-SCREENED template, BAD/GOOD examples, four Iron Rules, and per-consumer reading instructions. - CI lint
scripts/check_corpus_consumer_protocol.pyenforcing nine protocol invariants with manifest-driven consumer list (scripts/corpus_consumer_manifest.json). - Schema 9 caveat retired:
shared/handoff_schemas.mdretired the v3.6.4 "Consumer-side integration deferred to v3.6.5+" caveat; replaced with backpointer to the consumer protocol. - Presence-based, no schema change, no new env flag. Parse failures fall back to external-DB-only flow with a
[CORPUS PARSE FAILURE]surface.citation_compliance_agentcorpus integration deferred (target version TBD post-v3.8). - No breaking changes. Existing user adapters work without modification.
v3.6.4 (2026-04-25) — Material Passport literature_corpus[] Input Port
literature_corpus[]field added to Schema 9 as an optional input port for user-owned literature. Each entry conforms toshared/contracts/passport/literature_corpus_entry.schema.json(CSL-JSON authors, year, title, source_pointer + private optionalabstract/user_notes).- Language-neutral adapter contract at
academic-pipeline/references/adapters/overview.md: any program (any language) reading a user corpus source can produce conformantpassport.yaml+rejection_log.yaml. Fail-soft entry-level errors, fail-loud adapter-level errors, deterministic ordering. - Three reference Python adapters under
scripts/adapters/:folder_scan.py(filesystem of PDFs),zotero.py(Better BibTeX JSON export),obsidian.py(vault frontmatter). Starting points only; users are expected to write their own adapters for non-reference sources. - Rejection log contract at
shared/contracts/passport/rejection_log.schema.jsonwith closed enum of categorical reason values; always emitted (empty when no rejections). - CI gates:
scripts/check_literature_corpus_schema.pyvalidates schemas + adapter examples;scripts/sync_adapter_docs.py --checkprevents schema→docs drift; newpytest.ymlworkflow runsscripts/adapters/tests/on path-filtered triggers. - Input-port-only at v3.6.4: v3.6.4 shipped the schema and adapter contract without consumer integration.
bibliography_agentandliterature_strategist_agentwere wired in v3.6.5. - No breaking changes.
v3.6.3 (2026-04-23) — Opt-in Passport Reset Boundary
- Opt-in passport reset boundary (
ARS_PASSPORT_RESET=1).

