Epistemic Self-Monitoring (ESM)
A model-agnostic inference-time layer that reads transformer hidden states to classify epistemic state before tokens are committed.
from esm import EpistemicProbe
probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt")
probe.register(model) # one call — hooks into forward pass
output = model(input_ids=ids, use_cache=True)
print(probe.last)
# EpistemicState(PARAMETRIC, conf=0.97) → model knows this
# EpistemicState(CONTEXT_DEPENDENT, conf=0.89) → model using context
# EpistemicState(CONFABULATION_RISK, conf=0.74) → model making this up
The Problem
LLMs cannot tell when they are hallucinating. A model generating a confident, fluent, wrong answer looks identical from the inside to a model generating a confident, fluent, correct answer. Every downstream system — agents, tools, citation engines, medical assistants — inherits this epistemic blindness.
The Signal
Transformer hidden states at late-middle layers encode a discriminant signal that separates what the model knows from training from what the model is assembling from context or pattern-completing from nothing.
Validated: Fisher geometry AUROC 0.9944 at layer 27 (Qwen2.5-7B, TriviaQA, n=100). The discriminant is latent in the residual stream. It's not in the logits. It's not in attention weights. It's in the hidden state geometry, and it fires before the token is committed.
The Architecture
Three layers of epistemic monitoring:
| Layer | Signal | Status | |---|---|---| | Geometric (ESM probe) | Fisher LDA on hidden states | AUROC 0.9944, validated | | Positional (K-norm) | Mean key-norm per context position | rho +0.794, validated | | Symbolic (Credence) | Claim-level constraint tracking | FCR study validated |
Installation
pip install -e .
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from esm import EpistemicProbe
import torch
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt")
probe.register(model)
ids = tokenizer("What is the melting point of osmium?", return_tensors="pt").input_ids
with torch.no_grad():
out = model(input_ids=ids, use_cache=True)
print(probe.last)
Status
- [x] Fisher geometry validated (AUROC 0.9944, Qwen2.5-7B)
- [x] K-norm signal validated (rho +0.794)
- [x] EQL head trained (AUROC 0.736, undertrained)
- [x] Credence symbolic layer working (FCR validated)
- [ ] Cross-model validation (Week 1 — in progress)
- [ ] Hallucination-labeled training data (Week 3)
- [ ] Production demo (Week 5)
Repository Structure
esm/ — Epistemic Self-Monitoring package (Layer 1 + 2)
credence/ — Symbolic constraint tracking (Layer 3)
evals/
cross_model/ — Cross-model Fisher probe validation (WEEK 1)
t4_results/ — Validated T4 experimental results
checkpoints/ — Trained model artifacts
archive/ — Prior experimental work (CAMS eviction v1, Kaggle P1-P9)
The Demo
Three questions. Live terminal. Any transformer.
- Known fact →
PARAMETRIC 0.97✓ - Document Q&A →
CONTEXT_DEPENDENT 0.89✓ - Plausible fabrication →
CONFABULATION_RISK 0.74flagged before output






