<div align="center">
π§ LLM Memory MCP Server
Your AI assistants finally have a shared brain.
One memory. Every platform. Zero context lost.
Save a fact in Cursor β recall it in Claude β search it in VS Code β update it in Gemini β it's everywhere.
  
<br>
!Python 3.12 !PostgreSQL 16 !Docker !MCP !Tools !Prompts !License
</div>
---
<div align="center">
π₯ Why 2,000+ developers are switching to shared AI memory
</div>
| Without LLM Memory | With LLM Memory | |:---:|:---:| | π€ "I already told Claude my tech stack..." | π§ Every AI knows your stack on first message | | π€ "Cursor doesn't know what I did in Copilot..." | π§ Full cross-platform context, always | | π€ "I keep repeating my preferences..." | π§ Preferences auto-detected and saved silently | | π€ "My AI forgot our entire debugging session..." | π§ Conversations preserved with searchable history | | π€ "I lost that useful code snippet..." | π§ Procedural memory stores every pattern |
---
β‘ What Makes This Different
<table> <tr> <td width="50%">
ποΈ 4-Tier Memory Architecture
Not just a key-value store. A cognitive memory system inspired by human memory:
- Short-term β Working context (auto-expires)
- Semantic β Facts, preferences, decisions (permanent)
- Episodic β Conversation history (searchable)
- Procedural β Code patterns & how-tos
</td> <td width="50%">
π Hybrid AI Search
Every recall query searches all 4 tiers at once, ranked by:
Score = semantic_similarity Γ 0.30
+ text_relevance Γ 0.20
+ recency Γ 0.25
+ importance Γ 0.25
Powered by pgvector HNSW + GIN full-text indexes.
</td> </tr> <tr> <td width="50%">
π€ Auto-Injected Intelligence
When any AI connects, it automatically:
- Loads your working context on start
- Recalls relevant memories for your topic
- Silently detects & saves preferences
- Saves the conversation on end
- Extracts knowledge & consolidates memory
Zero manual prompting required.
</td> <td width="50%">
βοΈ Cross-Platform Conflict Resolution
When Cursor says "user prefers tabs" and Claude says "user prefers spaces":
- π Auto-detection via vector similarity
- π Conflict queue with side-by-side comparison
- π― 4 resolution strategies: keep existing, use new, merge, keep both
- π Version history for every knowledge change
</td> </tr> </table>
---
π Quick Start
60 seconds from zero to shared AI memory.
Prerequisites
- Docker & Docker Compose
- Any MCP-compatible AI platform
Option A: One-Command Setup (Recommended)
git clone https://github.com/ranjanjyoti152/LLM-MCP.git
cd LLM-MCP
./setup.sh
The setup script auto-detects Cursor, VS Code, Gemini CLI, Claude Desktop, Windsurf and generates config files.
Option B: Manual
git clone https://github.com/ranjanjyoti152/LLM-MCP.git
cd LLM-MCP
docker compose up -d --build
Verify
docker compose ps
# llm-mcp-postgres Up (healthy) 0.0.0.0:4569->5432
# llm-mcp-ollama Up (healthy) 0.0.0.0:9050->9050
# llm-mcp-server Up 0.0.0.0:4040->4040
# llm-mcp-dashboard Up 0.0.0.0:4041->4041
First boot takes a couple of minutes. Ollama pulls the
nomic-embed-textembedding model (~274MB) before it reports healthy, and the server + dashboard wait on that healthcheck. Watch it withdocker compose logs -f ollama. (If Ollama is ever unreachable at request time, the server falls back to a local hash embedder so writes still succeed.)
Try It!
Ask your AI:
"Save a knowledge entry: I prefer Python for backend and TypeScript for frontend."
Switch to any other AI and ask:
"What are my programming language preferences?"
β¨ It remembers. Across every platform. Forever.
---
π Web Dashboard
Live at http://localhost:4041 β a full-featured memory management UI.
<table> <tr> <td align="center"><b>π Overview</b><br><sub>Bento grid metrics, health stats, platform charts</sub></td> <td align="center"><b>π§ Knowledge</b><br><sub>Search, filter, version history per entry</sub></td> </tr> <tr> <td align="center"><b>π Conversations</b><br><sub>Full episodic memory with message threads</sub></td> <td align="center"><b>βοΈ Conflicts</b><br><sub>Side-by-side comparison, 1-click resolve</sub></td> </tr> <tr> <td align="center"><b>π Timeline</b><br><sub>Unified activity feed across all memory types</sub></td> <td align="center"><b>π§ Maintenance</b><br><sub>Cleanup, consolidate, decay, compress</sub></td> </tr> </table>
8 tabs Β· Dark theme Β· Auto-refresh Β· Chart.js visualizations Β· Conflict resolution UI Β· Version history modals
---
ποΈ Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI PLATFORMS β
β β
β ββββββββββββ ββββββββββ βββββββββββ ββββββββββ βββββββββ ββββββββββ β
β β Windsurf β β Cursor β β VS Code β β Claude β βGemini β β Codex β β
β βββββββ¬βββββ βββββ¬βββββ ββββββ¬βββββ βββββ¬βββββ ββββ¬βββββ βββββ¬βββββ β
β βββββββββββββ΄βββββββββββ΄βββββββββββ΄ββββββββββ΄βββββββββββ β
β β β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ
β MCP (Streamable HTTP)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β π§ LLM Memory MCP Server :4040 β
β β
β 39 Tools Β· 9 Prompts Β· 3 Resources β
β Auto-injected instructions for every LLM β
β Background scheduler (cleanup/decay/compress) β
β Version tracking Β· Conflict resolution β
β β
β π Dashboard UI :4041 β
β 19 REST endpoints Β· 8-tab interface β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β PostgreSQL 16 + pgvector :4569 β
β β
β βββββββββββ ββββββββββββ βββββββββββββ β
β βEpisodic β β Semantic β βShort-term β β
β βconvos + β βknowledge β βTTL-expire β β
β βmessages β β+ vectors β β+ consolid β β
β βββββββββββ ββββββββββββ βββββββββββββ β
β βββββββββββ ββββββββββββ βββββββββββββ β
β βProceduralβ βVersions β βConflicts β β
β βcode snipsβ βchangelog β βcross-plat β β
β βββββββββββ ββββββββββββ βββββββββββββ β
β β
β HNSW vector index + GIN full-text index β
β Hybrid search: semantic + keyword ranking β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
---
π― Supported Platforms
| Platform | Transport | Status | |:---------|:----------|:------:| | Windsurf | Streamable HTTP | β Ready | | Cursor | Streamable HTTP | β Ready | | VS Code + GitHub Copilot | Streamable HTTP | β Ready | | Claude Desktop | Streamable HTTP / stdio | β Ready | | Gemini CLI | Streamable HTTP | β Ready | | Antigravity (Google) | Streamable HTTP | β Ready | | ChatGPT (MCP-compatible) | Streamable HTTP | β Ready | | Codex (OpenAI) | Streamable HTTP | β Ready | | Any MCP-compatible client | Streamable HTTP | β Ready |
---
π§ Platform Configuration
<img src="https://img.shields.io/badge/-Windsurf-7c5cfc?style=flat-square" alt="Windsurf"> Windsurf
Option A β Via UI: Settings β MCP β Add Server β paste the URL.
Option B β Config file (.windsurf/mcp_config.json):
{
"mcpServers": {
"llm-memory": {
"serverUrl": "http://localhost:4040/mcp"
}
}
}
---
<img src="https://img.shields.io/badge/-Antigravity-4285F4?style=flat-square&logo=google&logoColor=white" alt="Antigravity"> Antigravity (Google)
Option A β Via UI: Go to Settings β MCP Servers β Add and paste the URL.
Option B β Via config file (mcp_config.json):
{
"mcpServers": {
"llm-memory": {
"serverUrl": "http://localhost:4040/mcp"
}
}
}
---
<img src="https://img.shields.io/badge/-Cursor-000000?style=flat-square&logo=cursor&logoColor=white" alt="Cursor"> Cursor
Option A β Via UI: Settings β MCP Servers β Add New MCP Server
Option B β Project-level config (.cursor/mcp.json):
{
"mcpServers": {
"llm-memory": {
"url": "http://localhost:4040/mcp"
}
}
}
Option C β Global config (~/.cursor/mcp.json) β applies to all projects.
---
<img src="https://img.shields.io/badge/-VS_Code-007ACC?style=flat-square&logo=visualstudiocode&logoColor=white" alt="VS Code"> VS Code + GitHub Copilot
Option A β Via Command Palette: Ctrl+Shift+P β MCP: Add Server β HTTP β enter http://localhost:4040/mcp
Option B β Workspace config (.vscode/mcp.json):
{
"servers": {
"llm-memory": {
"type": "http",
"url": "http://localhost:4040/mcp"
}
}
}
Option C β User settings (global): Add the same config to your VS Code user settings.json under "mcp".
---
<img src="https://img.shields.io/badge/-Gemini_CLI-8E75B2?style=flat-square&logo=googlegemini&logoColor=white" alt="Gemini"> Gemini CLI
Edit ~/.gemini/settings.json:
{
"mcpServers": {
"llm-memory": {
"httpUrl": "http://localhost:4040/mcp"
}
}
}
---
<img src="https://img.shields.io/badge/-Claude_Code-D4A574?style=flat-square" alt="Claude Code"> Claude Code (CLI)
Option A β One command (HTTP):
claude mcp add --transport http llm-memory http://localhost:4040/mcp
Option B β Local via Docker (stdio):
claude mcp add llm-memory -- docker exec -i llm-mcp-server python server.py stdio
Add --scope user to either command to make the server available across all your projects (default scope is the current project). Verify with claude mcp list.
A project-memory skill also ships in .claude/skills/ β with the server connected, the recall/save/compact behavior triggers automatically, like installing a skill.
---
<img src="https://img.shields.io/badge/-Claude-D4A574?style=flat-square" alt="Claude"> Claude Desktop
Option A β Local (Best Performance): Connect directly via Docker β no extra tools needed.
Go to Settings β Developer β Edit Config (claude_desktop_config.json):
{
"mcpServers": {
"llm-memory": {
"command": "docker",
"args": [
"exec",
"-i",
"llm-mcp-server",
"python",
"server.py",
"stdio"
]
}
}
}
---
<img src="https://img.shields.io/badge/-ChatGPT-74AA9C?style=flat-square&logo=openai&logoColor=white" alt="ChatGPT"> ChatGPT / Codex / Other MCP Clients
For any platform that supports MCP via HTTP, use:
Endpoint: http://localhost:4040/mcp
Transport: Streamable HTTP (JSON-RPC over POST with optional SSE streaming)
---
π οΈ 39 MCP Tools
<details open> <summary><b>π¬ Conversations (Episodic Memory)</b></summary>
| Tool | What it does | |:-----|:------------| | save_conversation | Save full conversation with messages, metadata, importance, outcome | | search_memory | Full-text + semantic search across all conversations | | get_recent_conversations | Latest conversations by platform | | get_conversation_by_id | Retrieve specific conversation with all messages | | add_message_to_conversation | Append messages to existing conversation | | tag_conversation | Add/remove tags | | delete_memory | Delete conversation or knowledge by ID |
</details>
<details open> <summary><b>π§ Knowledge (Semantic Memory)</b></summary>
| Tool | What it does | |:-----|:------------| | save_knowledge | Store fact/preference/instruction/decision | | save_knowledge_smart | Conflict-aware save β detects duplicates & cross-platform conflicts | | search_knowledge | Search by query, category, tags | | list_all_knowledge | Paginated listing with category filter | | get_knowledge_by_category | All entries in a category | | get_related_knowledge | Similar entries by vector proximity | | update_knowledge | Update with automatic version snapshot | | auto_extract_preferences | Batch-extract preferences from conversation text | | get_context_summary | Combined knowledge + conversation context |
</details>
<details> <summary><b>β±οΈ Working Memory (Short-term)</b></summary>
| Tool | What it does | |:-----|:------------| | save_short_term_memory | Save transient context with TTL auto-expiry | | get_working_context | Load all active session context | | consolidate_memories | Promote important STM β long-term knowledge |
</details>
<details> <summary><b>π» Code & Projects (Procedural Memory)</b></summary>
| Tool | What it does | |:-----|:------------| | save_code_snippet | Save reusable code with language, tags, description | | search_code_snippets | Search by keyword, language, tags | | save_project_context | Save project-level tech stack & architecture | | get_project_context | Retrieve project context by name |
</details>
<details> <summary><b>π Search & Retrieval</b></summary>
| Tool | What it does | |:-----|:------------| | recall | PRIMARY β searches all 4 memory tiers at once, ranked by composite score; pass project to boost the active repo | | search_by_tags | Cross-type tag search | | compact_context | Token saver β offloads a bulky context block into memory, returns a dense summary + recall handle |
</details>
<details> <summary><b>βοΈ Versioning & Conflicts</b></summary>
| Tool | What it does | |:-----|:------------| | knowledge_history | Full version timeline for any knowledge entry | | rollback_knowledge | Restore to any previous version | | list_conflicts | View pending/resolved cross-platform conflicts | | resolve_conflict | Resolve with strategy: keep_existing, use_new, merge, keep_both |
</details>
<details> <summary><b>π§ Maintenance & Utility</b></summary>
| Tool | What it does | |:-----|:------------| | count_memories | Count all memory types | | summarize_platform_activity | Per-platform stats | | cleanup_expired_memories | Remove expired STM & knowledge | | decay_memories | Reduce importance of old unaccessed memories | | export_memories | Full JSON backup | | import_memories | Restore from backup (with dedup) | | clear_platform_data | Delete all data for a platform β οΈ |
</details>
π‘ 3 MCP Resources
| URI | Description | |:----|:-----------| | memory://stats | Database statistics & counts | | memory://platforms | All platforms with stored data | | memory://health | System health across all memory tiers |
π― 9 Smart Prompts
Auto-discoverable prompt templates for key workflows:
| Prompt | What it does | |:-------|:-----------| | start_conversation | Initialize with full memory context | | end_conversation | Save everything + extract knowledge | | compact_now | Offload long context into memory to cut token usage | | save_user_preference | Structured preference storage | | recall_everything | Deep search across all memory | | resolve_all_conflicts | Guided conflict resolution | | memory_maintenance | Run all maintenance tasks | | onboard_new_user | First-time setup & preference capture | | debug_session | Context-aware debugging workflow |
π¬ Invoking prompts as commands
MCP prompts are exposed as slash commands, but the exact syntax depends on the platform. The server is registered as llm-memory in all the configs above. Prompt arguments are passed space-separated after the command.
<details open> <summary><b><img src="https://img.shields.io/badge/-Claude_Code-D4A574?style=flat-square" alt="Claude Code"> Claude Code (CLI)</b></summary>
Prompts appear as /mcp__<server>__<prompt>:
/mcp__llm-memory__start_conversation claude-code "auth refactor"
/mcp__llm-memory__recall_everything "database decisions"
/mcp__llm-memory__compact_now my-repo claude-code
/mcp__llm-memory__end_conversation claude-code "Auth refactor" success
Run /mcp to list connected servers and browse their prompts. You usually don't need these β with the server connected, recall/save/compact happen automatically β but the commands are there for explicit control.
</details>
<details> <summary><b><img src="https://img.shields.io/badge/-VS_Code-007ACC?style=flat-square&logo=visualstudiocode&logoColor=white" alt="VS Code"> VS Code + GitHub Copilot</b></summary>
Prompts appear in Copilot Chat as /mcp.<server>.<prompt>:
/mcp.llm-memory.start_conversation
/mcp.llm-memory.recall_everything
Type / in the chat box to see the list; the chat will prompt you for each argument.
</details>
<details> <summary><b><img src="https://img.shields.io/badge/-Claude-D4A574?style=flat-square" alt="Claude"> Claude Desktop</b></summary>
Click the + (attachments) button in the message box, choose llm-memory, then pick a prompt from the list. Fill in the arguments when prompted. Prompts surface as reusable templates rather than typed slash commands.
</details>
<details> <summary><b><img src="https://img.shields.io/badge/-Gemini_CLI-8E75B2?style=flat-square&logo=googlegemini&logoColor=white" alt="Gemini"> Gemini CLI</b></summary>
MCP prompts register as slash commands directly:
/start_conversation
/recall_everything
Run /mcp to view connected servers and their available prompts.
</details>
<details> <summary><b><img src="https://img.shields.io/badge/-Cursor-000000?style=flat-square&logo=cursor&logoColor=white" alt="Cursor"> Cursor / <img src="https://img.shields.io/badge/-Windsurf-7c5cfc?style=flat-square" alt="Windsurf"> Windsurf / ChatGPT</b></summary>
These clients focus on auto-invoked tools rather than slash-command prompts. Just describe what you want in natural language and the model calls the underlying tools:
"Recall everything you know about this project's database decisions."
"Save this preference: I always use async/await."
"Compact this conversation into memory to save tokens."
The same recall / save_knowledge_smart / compact_context tools run underneath.
</details>
---
𧬠Auto-Injected Behaviors
When any AI connects to this MCP server, it automatically receives behavioral instructions β no user action needed:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONVERSATION START (automatic) β
β 1. get_working_context() β load session context β
β 2. recall("<topic>") β search all memory for relevance β
β 3. Personalize response using recalled memories β
β 4. save_short_term_memory() β track current task β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β DURING CONVERSATION (automatic, silent) β
β β’ Detect preferences β save_knowledge_smart() β
β β’ Detect facts β save_knowledge_smart() β
β β’ Detect decisions β save_knowledge_smart() β
β β’ Detect code patterns β save_code_snippet() β
β β’ All saves are conflict-aware (dedup + cross-platform) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β CONVERSATION END (automatic) β
β 1. save_conversation() β with importance + outcome β
β 2. auto_extract_preferences() β batch knowledge extraction β
β 3. consolidate_memories() β promote STM β long-term β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Result: Every AI assistant becomes memory-aware from the moment it connects. No setup. No prompting. It just works.
---
π Project Structure
LLM-MCP/
βββ server.py # MCP server β 39 tools, 9 prompts, 3 resources
βββ db.py # Async DB layer (asyncpg + pgvector + FTS)
βββ embeddings.py # Embedding engine (local/ollama/openai)
βββ dashboard.py # REST API for web dashboard (Starlette)
βββ static/
β βββ index.html # Dashboard UI (Tailwind + Chart.js)
βββ prompts/
β βββ system_prompt.md # Standalone system prompt for any LLM
β βββ quick_prompts.md # 12 copy-paste prompt templates
βββ docker-compose.yml # PostgreSQL + MCP Server + Dashboard
βββ Dockerfile # Python 3.12 slim container
βββ setup.sh # One-command auto-setup script
βββ .env # Environment configuration
βββ requirements.txt # Python dependencies
βββ test_client.py # End-to-end test suite
βββ test_versioning.py # Versioning & conflict resolution tests
βββ test_prompts.py # MCP prompt discovery tests
---
βοΈ Configuration
All settings via .env:
| Variable | Default | Description | |:---------|:--------|:------------| | POSTGRES_PORT | 4569 | PostgreSQL host port | | MCP_PORT | 4040 | MCP server port | | DASHBOARD_PORT | 4041 | Dashboard UI port | | POSTGRES_USER | mcp_user | Database user | | POSTGRES_PASSWORD | mcp_secure_pass_2026 | Database password | | POSTGRES_DB | mcp_memory | Database name | | EMBEDDING_PROVIDER | ollama | local / ollama / openai | | OLLAMA_PORT | 9050 | Host port for the bundled Ollama API | | OLLAMA_MODEL | nomic-embed-text | Embedding model Ollama pulls on first boot (~274MB) | | OLLAMA_DIM | 768 | Vector dimension β change only if you swap to a non-768-dim model | | MAINTENANCE_INTERVAL_MINUTES | 30 | Background scheduler interval |
LAN Access
Replace localhost with your machine's IP for remote AI platforms:
http://192.168.x.x:4040/mcp # MCP Server
http://192.168.x.x:4041 # Dashboard
---
ποΈ Database Schema
8 tables with hybrid search indexes:
βββββββββββββββββββ ββββββββββββββββββββ
β conversations ββββββΆβ messages β Episodic memory
β (importance, β β (role, content, β
β outcome, β β embedding) β
β embedding) β ββββββββββββββββββββ
βββββββββββββββββββ
βββββββββββββββββββ ββββββββββββββββββββ
β knowledge ββββββΆβknowledge_versionsβ Semantic memory
β (category, β β (version, diff, β + version history
β version, β β changed_by) β
β embedding) β ββββββββββββββββββββ
βββββββββββββββββββ
βββββββββββββββββββ ββββββββββββββββββββ
βshort_term_memory β βmemory_conflicts β Working memory
β (TTL, context, β β (existing vs β + conflict tracking
β consolidated) β β conflicting) β
βββββββββββββββββββ ββββββββββββββββββββ
βββββββββββββββββββ ββββββββββββββββββββ
β code_snippets β β projects β Procedural memory
β (language, β β (tech_stack, β + project context
β embedding) β β architecture) β
βββββββββββββββββββ ββββββββββββββββββββ
Indexes: HNSW (vector similarity) + GIN (full-text search) + B-tree (importance, expiry) for sub-millisecond hybrid queries.
---
π§ͺ Testing
# Full test suite
python test_client.py
# Versioning & conflict resolution
python test_versioning.py
# MCP prompt discovery
python test_prompts.py
<details> <summary>Manual verification commands</summary>
# Check services
docker compose ps
# PostgreSQL direct query
docker exec llm-mcp-postgres psql -U mcp_user -d mcp_memory \
-c "SELECT COUNT(*) as knowledge FROM knowledge;"
# MCP server logs
docker logs -f llm-mcp-server
# Dashboard logs
docker logs -f llm-mcp-dashboard
# Restart everything
docker compose restart
</details>
---
π Docker Commands
| Command | Description | |:--------|:------------| | docker compose up -d --build | Start all services | | docker compose down | Stop all services | | docker compose logs -f mcp-server | Stream server logs | | docker compose logs -f dashboard | Stream dashboard logs | | docker compose down -v | Stop & delete all data β οΈ |
---
π Security
- Bind to
127.0.0.1for local-only:MCP_HOST=127.0.0.1 - Change
POSTGRES_PASSWORDin production - Add reverse proxy (nginx/Caddy) with TLS for remote access
- No auth by default β designed for local/trusted network use
---
πΊοΈ Roadmap
- [x] Semantic search with pgvector embeddings
- [x] Automatic conversation summarization (compression)
- [x] Memory expiration & archival policies
- [x] Background maintenance scheduler
- [x] Multi-tier memory (short-term, semantic, episodic, procedural)
- [x] Importance scoring & time-based decay
- [x] One-command auto-setup script
- [x] Memory versioning & change tracking
- [x] Cross-platform conflict resolution
- [x] Web dashboard with real-time visualization
- [x] Auto-injected behavioral instructions
- [x] MCP prompt workflows
- [ ] Authentication / API keys for multi-user
- [ ] Webhook notifications on new memories
- [ ] Memory sharing between users
- [ ] Cloud-hosted option (no Docker needed)
- [ ] Mobile companion app
---
π€ Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
All contributions welcome β features, bug fixes, docs, translations.
---
π License
MIT License β see LICENSE for details.
---
<div align="center">
β If this project saves you from repeating yourself to your AIs, give it a star!
Star this repo Β· Report Bug Β· Request Feature
<br>
Built with β€οΈ by ranjanjyoti152
Stop repeating yourself. Let your AIs share a brain.
<br>
<sub>If you found this useful, consider sharing it with other developers who use multiple AI tools.</sub>
</div>






