thoughtproof-mcp
  
MCP server for ThoughtProof ā verify AI reasoning with adversarial multi-model consensus.
3ā4 LLMs (Grok, Gemini, DeepSeek, Sonnet) independently evaluate every claim. A dedicated red-team model critiques their verdicts. A synthesizer (Sonnet) weighs everything and returns ALLOW, BLOCK, or UNCERTAIN with confidence score and objections.
Quick Start
{
"mcpServers": {
"thoughtproof": {
"command": "npx",
"args": ["-y", "thoughtproof-mcp"],
"env": {
"THOUGHTPROOF_API_KEY": "tp_op_your_key_here"
}
}
}
}
Works with Claude Desktop, Cursor, Windsurf, Cline, and any MCP-compatible client.
Tools
verify_claim
Verify any claim or AI-generated reasoning before acting on it.
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | claim | string | (required) | The text to verify | | stakeLevel | low / medium / high / critical | medium | Risk level ā higher stakes trigger deeper verification | | domain | financial / medical / legal / code / general | general | Domain context for specialized verification | | speed | fast / standard / deep | standard | Verification depth |
check_agent_score
Look up an agent's composite trust score on the ERC-8004 registry.
| Parameter | Type | Description | |-----------|------|-------------| | agentId | string | Agent ID to look up | | domain | string | Optional domain filter |
Example
In Claude Desktop or Cursor, just ask:
"Verify the claim: GPT-5 achieves 95% accuracy on MMLU-Pro"
The tool returns:
ā ļø UNCERTAIN (42% confidence)
Claim: "GPT-5 achieves 95% accuracy on MMLU-Pro"
Objections:
- Insufficient public benchmark data to confirm
- Historical accuracy claims have been overstated
- MMLU-Pro methodology has known ceiling effects
ā” 3.2s | Adversarial Multi-Model Consensus
How It Works
Your AI Agent
ā
ā¼
āāāāāāāāāāāāāāāāāāāā
ā thoughtproof-mcp ā ā MCP Server (this package)
āāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāā
ā ThoughtProof API ā ā api.thoughtproof.ai (RV)
āāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Stage 1: Independent Evaluation ā
ā 3ā4 LLMs (Grok, Gemini, DeepSeek, ā
ā Sonnet) each examine the claim ā
ā ā
ā Stage 2: Red-Team Critique ā
ā 1 dedicated model challenges all ā
ā initial verdicts ā
ā ā
ā Stage 3: Synthesis ā
ā Sonnet weighs verdicts + critique ā
ā ā final decision ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
ALLOW / BLOCK / UNCERTAIN
+ confidence % + objections
Pricing
| Speed | Models | Cost per verification | |-------|--------|-----------------------| | fast | 2 | $0.008 | | standard | 4 | $0.02 | | deep | 5+ | $0.08 |
Payment: API key (operator account) or x402 micropayment (USDC on Base).
API Key
Get an operator API key at thoughtproof.ai. Without a key, verifications use x402 micropayments automatically.
Configuration
| Environment Variable | Default | Description | |---------------------|---------|-------------| | THOUGHTPROOF_API_KEY | (none) | Operator API key | | THOUGHTPROOF_BASE_URL | https://api.thoughtproof.ai | API base URL |
Development
git clone https://github.com/ThoughtProof/thoughtproof-mcp.git
cd thoughtproof-mcp
npm install
npm run build
npm test
npm run dev # Run with tsx (hot reload)
npm run inspect # Test with MCP Inspector
Related
- ThoughtProof ā Decision verification for AI agents
- pot-cli ā CLI for reasoning verification
- ERC-8004 ā Autonomous Agent Registry
License
MIT ā ThoughtProof






