openclaw-brain

xz0831/openclaw-brain
0 starsMITCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

An MCP server that ingests semiconductor PDFs into a Neo4j knowledge graph, enabling AI agents to query domain knowledge, verify claims against source text, and record design reasoning.

README.md

openclaw-brain

An engineering knowledge-graph + memory system — the memory and guardrails for an AI circuit-design mentor.

openclaw-brain ingests semiconductor PDFs (textbooks, papers), extracts concepts / equations / typed relationships via LLMs, and stores them in a Neo4j knowledge graph. It is exposed as an MCP server so any MCP-compatible agent (OpenClaw, Claude Code, …) can query domain knowledge, verify it against the original source text, and write its own design reasoning back into the graph.

What it does

  • IngestPDF → typed knowledge graph through an 11-stage pipeline

(parse → figures → chunk → extract → ground → match → reason → reconcile → commit → embed → summarize). Only two stages do the heavy "understanding" (an LLM); the rest are mechanical.

  • Serve — exposes ~24 MCP tools: query_knowledge, get_evidence, answer_question,

record_hypothesis / record_decision / record_bench_result, merge_concepts, retract_node, …

  • Ground — every node is named, typed, confidence-scored, and traceable to the exact source chunk;

a grounding stage drops claims the chunk text doesn't support.

A single unit of the graph looks like this — a real node and a real typed edge, exactly as they sit in the graph:

(Cascode Device) ──[ SOLVES_PROBLEM ]──> (Power Supply Rejection)
  confidence 0.70 · layer L2 (analog/EDA) · evidence: chunk_c635e958d19e
  rationale: "cascode devices raise effective output resistance, improving supply rejection (PSRR)…"

The honest bottom line

The project started with one bet — "make a cheap local model reason like an expensive one" — and measured it false. Because the failure was measured cleanly, two things that genuinely ship came out of it: (1) a grounding / fabrication-control mechanism that drops source-unsupported claims, with measured fabrication near-zero on the evaluation arms (the production-graph fabrication is not yet separately measured), and (2) a debugging discipline that catches when the measurement instrument itself is lying. The full development log — including the dead-ends and the numbers — is in docs/DEVLOG.md.

Quickstart

Requires Python 3.11+ and Neo4j 5.

python -m venv .venv && .venv/bin/pip install -e .
docker compose up -d                        # Neo4j on :7687
.venv/bin/openclaw-brain apply-schema       # constraints + vector indexes

.venv/bin/openclaw-brain serve              # MCP server (stdio — used by the agent)
.venv/bin/openclaw-brain status             # Neo4j health + node counts
.venv/bin/openclaw-brain export-obsidian    # graph → browsable Obsidian vault (~/Semiconductor)

Ingesting a PDF and asking questions both happen through the agent calling MCP tools (ingest_pdf(file_path=…), query_knowledge(query=…)); the full tool list is in src/openclaw_brain/server/mcp_server.py.

Architecture

src/openclaw_brain/
├── agent.py             # BrainAgent — the single public API (all MCP tools delegate here)
├── knowledge/           # pipeline · extraction · reasoning · graph store (Neo4j)
├── memory/              # episodic / semantic / procedural memory + promotion
├── llm/                 # provider (model catalog) + resilience (retry / fallback)
└── server/mcp_server.py # FastMCP server exposing BrainAgent as MCP tools

Routing is local-first: shallow stages run on local/cheap models, the depth-bearing extract and reason stages run on a cheap hosted model (deepseek-v4-flash), and frontier models (Opus / Codex) are used only as the teacher/ceiling. The authoritative stage→model config lives in config/default.toml. See CLAUDE.md for the full module map and docs/DECISIONS.md for the architecture decision records.

Status

Production graph rebuilt clean on deepseek-v4-flash: 5 sources (Razavi textbook + 4 CIS papers) → 4,336 concepts, 2,268 circuit topologies, 581 equations. Knowledge is stored as natural language (concept descriptions + ~19k typed-edge rationales + a verbatim EvidenceVault); embeddings are a rebuildable index, not the asset of record.

.venv/bin/python3 -m pytest tests/ -q         # Neo4j-backed tests auto-skip without a DB

License

MIT © 2026 Rick (github.com/xz0831).

Related MCP servers

Browse all →