mcp-vl-msa-rs
    
A searchable long-term memory for AI agents, exposed as an MCP stdio server. Index documents, notes and past conversations into collections; retrieve the top-k relevant chunks for a query and inject the original text back to the model; add or drop agent memories with msa_remember / msa_forget. Pure Rust, BM25 over tantivy, zero ML deps in the default build; optional in-process dense rerank.
Any MCP client (Claude Code, Codex, or anything speaking MCP stdio) gets the same memory: a queryable corpus that survives across sessions and model swaps, with no cloud account and no embedding service required. Use it to give an agent durable recall over a knowledge base, a docs tree, or its own chat history — retrieval that returns the original text, not just embeddings.
It is one half of a two-part memory: this server is the library (corpus recall), its companion mcp-memory-rs is the notebook (curated state). An agent that swaps models loses neither.
flowchart LR
A["AI agent<br/>(any MCP client)"]
A -->|"curated state<br/>read / write / sync"| M["mcp-memory-rs<br/><i>the notebook</i>"]
A -->|"corpus recall<br/>index / search / fetch"| V["mcp-vl-msa-rs<br/><i>the library</i>"]
M --- D1[("JSON categories<br/>SQLite FTS5")]
V --- D2[("tantivy BM25<br/>collections")]
The name: msa is the retrieval pattern it borrows from the Memory Sparse Attention paper (arXiv:2603.23516) — an extrinsic approximation, not the neural model; distinct from MiniMax's MSA-architecture LLMs, which are intrinsic (in-model) generators. vl is for Vivling (codex-vl), its first adopter — but the server is fully AI-agnostic and depends on nothing from it.
Status: v0.4 — hybrid sparse+dense optional.
Why
The original Memory Sparse Attention paper (EverMind-AI) describes an end-to-end trainable sparse attention layer over chunk-pooled KV caches. That is a neural artifact and is not portable to a pure-Rust MCP server. What is portable, and what this repo aims to deliver, is the MSA macro pattern:
- Chunked storage of long-form text with a small fixed pool size (
P=64words by default, mirroring the paper). - Top-k sparse routing over chunks (BM25 surrogate; learned routing is out of scope).
- Original text injection (paper §4.3, ablation -37.1% without):
msa_searchreturns chunks,msa_fetch_docreturns the full document. - Memory Interleave as a protocol (planned v0.4): the AI client orchestrates multi-hop retrieval through repeated tool calls with a server-side cursor.
Design and rationale are documented in the project notes (negative results, gate methodology); see docs/NEGATIVE_RESULTS.md.
Benchmarks
Retrieval changes are decided on pre-registered, paired deltas with bootstrap confidence intervals — not on absolute scores. Workloads: HotpotQA (extractive QA), MLDR-it (long-doc retrieval, Italian), LongMemEval-S (500 conversational-memory questions). Full methodology, acceptance gates and refuted hypotheses live in docs/NEGATIVE_RESULTS.md.
Headline measurements:
- BM25 is the engine, not a placeholder. Three pre-registered attempts; no
hybrid (BM25 + dense rerank) configuration beat the gate on these workloads. Dense rerank stays available (dense_alpha, off by default) for re-testing as encoders improve.
- Rich capsules at ingestion (deterministic enrich, no LLM): **+7 to +20
recall@5** across every category.
- Original-text injection (
msa_fetch_docaftermsa_search): +14.6 F1
exactly on the stratum where snippets miss the content.
- Recency priors lose — handle time at serving, not in the retrieval score.
Reproduce:
crates/msa-bench/scripts/download-bench-datasets.sh # fetch datasets
scripts/run-baseline-bench.sh # BM25 vs BM25+dense sweep
# results land under crates/msa-bench/results/ as JSON
Tool surface
| Tool | Since | Description | |---|---|---| | msa_index | v0.1 | Index a document; existing chunks for doc_id are replaced. | | msa_search | v0.1 | Top-k chunks, score normalized 0.0–1.0. | | msa_fetch_doc | v0.1 | Full original text of a document. | | msa_delete | v0.1 | Remove a document and all its chunks. | | msa_list_collections | v0.1 | Collections open in the registry. | | msa_stats | v0.1 | Per-collection statistics (exact num_documents / total_tokens). | | SearchFilter | v0.2 | Metadata filter (where_eq/where_in/created_*), post-retrieval. | | msa_search_iterative | v0.3 | Memory Interleave with server-side cursor; dedups across rounds. | | msa_drop_session | v0.3 | Force-evict a Memory Interleave session before TTL. | | dense_alpha on msa_search | v0.4 | Hybrid BM25 + cosine rerank. Requires --features embeddings + [embeddings] config. | | msa_remember / msa_forget | v0.4 | Agent-memory surface: enrich + low-signal gate + content-hash dedup; standard metadata (kind / source_id / created_at). | | msa_sync_path | v0.4 | Mirror a directory into a collection (filesystem source; blake3 delta sync). |
Install
Prebuilt binary (recommended) — download the archive for your platform from the latest release, extract, and point your MCP client at the binary:
tar xzf mcp-vl-msa-rs-x86_64-unknown-linux-gnu.tar.gz
install -m755 mcp-vl-msa-rs-*/mcp-vl-msa-rs ~/.local/bin/
Prebuilt targets (Linux + Android): x86_64-unknown-linux-gnu, x86_64-unknown-linux-musl, aarch64-unknown-linux-gnu, aarch64-unknown-linux-musl (edge / ARM / Termux), aarch64-linux-android.
macOS: no prebuilt binary is shipped (it would need Apple code-signing). Install from source instead — cargo install below compiles it on your Mac in one command, no signing needed.
From source (Rust toolchain) — --locked is required (the workspace Cargo.lock pins a working time / tantivy-common resolution; a fresh resolve breaks the build), and mcp-msa-server is the package name (the binary it installs is mcp-vl-msa-rs):
cargo install --git https://github.com/DioNanos/mcp-vl-msa-rs \
--locked --features source-fs mcp-msa-server
Build & test
cd mcp-vl-msa-rs
# Default: pure BM25, zero network deps
cargo build --release
cargo test
# Hybrid sparse + dense (in-process Candle rerank, no external service)
cargo build --release --features embeddings
cargo test --features embeddings
Hybrid mode config
Add [embeddings] to MCP_MSA_CONFIG to activate dense rerank. Without this section the server stays in BM25-only mode even when the binary was built with --features embeddings.
The production backend is candle-modernbert: the encoder runs in-process (Candle), offline-deterministic, from a local model bundle — no daemon, no network at runtime, no automatic downloads. Prepare the bundle once with scripts/prepare-granite-r2-97m.sh.
[storage]
storage_dir = "~/.local/state/mcp-vl-msa-rs"
[chunking]
chunk_size = 64
overlap = 0
[embeddings]
backend = "candle-modernbert"
model_dir = "~/.local/share/mcp-vl-msa-rs/models/granite-r2-97m"
dim = 768
model_id = "granite-r2-97m"
A transitional backend = "ollama" (HTTP to an Ollama-compatible service) still exists but is deprecated and scheduled for removal in v0.6 — do not build new setups on it.
The AI client opts into hybrid scoring per-call by passing dense_alpha to msa_search (or any future tool that supports it). dense_alpha = 1.0 (default) is BM25-only; 0.0 is dense-only; intermediate values are a linear blend α·bm25 + (1-α)·((cos+1)/2). Cosine is shifted to [0,1] so it composes linearly with the already max-normalized BM25 score.
Run as MCP stdio
# Default storage: ~/.local/state/mcp-vl-msa-rs/
./target/release/mcp-vl-msa-rs
# With explicit config
MCP_VL_MSA_CONFIG=~/.config/mcp-vl-msa-rs/config.toml \
MCP_DEVICE=my-node \
./target/release/mcp-vl-msa-rs
Example ~/.codex/config.toml entry:
[mcp_servers.vl_msa]
command = "/path/to/mcp-vl-msa-rs/target/release/mcp-vl-msa-rs"
env = { MCP_DEVICE = "my-node" }
# let the model call tools without a per-call approval prompt
default_tools_approval_mode = "approve"
Equivalent ~/.claude.json entry for Claude Code:
{
"mcpServers": {
"vl_msa": {
"command": "/path/to/mcp-vl-msa-rs/target/release/mcp-vl-msa-rs",
"env": { "MCP_DEVICE": "my-node" }
}
}
}
AI client compatibility
- Clients with partial MCP support may not surface the server's
instructions
text. The tool descriptions and request-field descriptions are self-contained, so a model can work from those alone.
- Read-only tools (
msa_search,msa_fetch_doc,msa_stats,
msa_list_collections, msa_manifest, msa_search_iterative, msa_interleave_round) carry the readOnlyHint annotation, which lets a gating client auto-approve them.
- If a model reports an "unsupported call" or "user cancelled" on codex, that is
the approval gate, not a server fault — set default_tools_approval_mode (above) so tool calls are not blocked on a prompt.
Storage layout
~/.local/state/mcp-vl-msa-rs/
├── <collection_a>/ ← tantivy index directory
├── <collection_b>/
└── ...
Each collection is an independent tantivy index. Collection names are validated (rejected if they contain path separators, .., etc.) so a collection cannot escape the root.
Roadmap
Shipped:
- v0.2 —
SearchFilter(where_eq / where_in / created range), post-retrieval. - v0.3 —
msa_search_iterativeMemory Interleave with server-side cursor + TTL'dMsaSessionregistry. - v0.4 — hybrid BM25 + dense rerank behind feature flag
embeddings, Ollama backend, per-calldense_alpha; agent-memory surface (msa_remember/msa_forget); filesystem source metadata (created_at/source/ext/dir) at index time; exactnum_documents/total_tokensinmsa_stats;msa-benchreproducible benchmark crate; prebuilt-binary packaging.
Next (not yet built):
- Query-time tantivy filter (today
SearchFilterruns post-retrieval; fine for
normal corpora, but a pre-filter would help when selectivity is high on a very large index).
- ACL for multi-tenant collections.
- Tool-description tuning.
Related work
- MSA paper (arXiv:2603.23516) — the
architectural inspiration (neural, intrinsic); this repo is an extrinsic, pure-Rust approximation of the macro pattern.
- Vivling (in
codex-vl) — the first downstream consumer: this server is
its long-term memory.
- mcp-memory-rs — the companion
server for curated agent state (named JSON categories, per-device ACL, fleet sync). This server does corpus recall; together they cover both halves of agent memory: the curated notebook and the queryable library.
License
Apache-2.0. See LICENSE.






