Epistemic Self-Monitoring (ESM)

A model-agnostic inference-time layer that reads transformer hidden states to classify epistemic state before tokens are committed.

from esm import EpistemicProbe

probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt")
probe.register(model)  # one call — hooks into forward pass

output = model(input_ids=ids, use_cache=True)
print(probe.last)
# EpistemicState(PARAMETRIC, conf=0.97)        → model knows this
# EpistemicState(CONTEXT_DEPENDENT, conf=0.89) → model using context
# EpistemicState(CONFABULATION_RISK, conf=0.74) → model making this up

The Problem

LLMs cannot tell when they are hallucinating. A model generating a confident, fluent, wrong answer looks identical from the inside to a model generating a confident, fluent, correct answer. Every downstream system — agents, tools, citation engines, medical assistants — inherits this epistemic blindness.

The Signal

Transformer hidden states at late-middle layers encode a discriminant signal that separates what the model knows from training from what the model is assembling from context or pattern-completing from nothing.

Validated: Fisher geometry AUROC 0.9944 at layer 27 (Qwen2.5-7B, TriviaQA, n=100). The discriminant is latent in the residual stream. It's not in the logits. It's not in attention weights. It's in the hidden state geometry, and it fires before the token is committed.

The Architecture

Three layers of epistemic monitoring:

| Layer | Signal | Status | |---|---|---| | Geometric (ESM probe) | Fisher LDA on hidden states | AUROC 0.9944, validated | | Positional (K-norm) | Mean key-norm per context position | rho +0.794, validated | | Symbolic (Credence) | Claim-level constraint tracking | FCR study validated |

Installation

pip install -e .

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from esm import EpistemicProbe
import torch

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt")
probe.register(model)

ids = tokenizer("What is the melting point of osmium?", return_tensors="pt").input_ids
with torch.no_grad():
    out = model(input_ids=ids, use_cache=True)

print(probe.last)

Status

[x] Fisher geometry validated (AUROC 0.9944, Qwen2.5-7B)
[x] K-norm signal validated (rho +0.794)
[x] EQL head trained (AUROC 0.736, undertrained)
[x] Credence symbolic layer working (FCR validated)
[ ] Cross-model validation (Week 1 — in progress)
[ ] Hallucination-labeled training data (Week 3)
[ ] Production demo (Week 5)

Repository Structure

esm/           — Epistemic Self-Monitoring package (Layer 1 + 2)
credence/      — Symbolic constraint tracking (Layer 3)
evals/
  cross_model/ — Cross-model Fisher probe validation (WEEK 1)
  t4_results/  — Validated T4 experimental results
checkpoints/   — Trained model artifacts
archive/       — Prior experimental work (CAMS eviction v1, Kaggle P1-P9)

The Demo

Three questions. Live terminal. Any transformer.

Known fact → PARAMETRIC 0.97 ✓
Document Q&A → CONTEXT_DEPENDENT 0.89 ✓
Plausible fabrication → CONFABULATION_RISK 0.74 flagged before output

Credence — Epistemic Guard