๐ฉบ DocPulse
DocPulse is a local-first MCP (Model Context Protocol) Server that transforms massive documentation into high-density, LLM-ready "Implementation Manifestos."
Built for Apple Silicon, it leverages native MLX for local inference, combined with intelligent web crawling and community context search (Reddit/StackOverflow) to provide a 360-degree view of any library, standard, or regulation.
 
---
๐ฏ Why DocPulse?
- Token Efficiency: Scraping and distilling docs locally saves a massive amount of tokens. Instead of sending thousands of raw HTML lines to a cloud LLM, you send only a dense, 2-3 page distilled manifesto.
- Internal Network Support: Designed to work seamlessly within organizational networks. If your documentation is hosted on an internal wiki or API that doesn't require MFA/Auth, DocPulse can ingest it and provide context without exposing sensitive raw data to external scraping services.
๐ Key Features
- Multi-Source Ingestion:
- Web: Intelligent crawling that bypasses JS-heavy UI noise.
- PDF: Deep parsing of regulatory or technical PDF documents.
- Local Files: Ingest single
.md,.txt,.py, etc. - Directories: Recursive scanning of entire folders (local or mounted remote drives like OneDrive).
- Local LLM Distillation: Uses
mlx-lmwith DeepSeek models to extract API signatures, version constraints, and logical edge cases. - Community Augmentation: Automatically fetches recent community discussions to identify undocumented bugs or workarounds.
- Fixed-Resource Optimal Sizing: By default, strictly utilizes a highly optimized
7Bdistillation model to maximize extraction speed and save RAM for other coding agents without losing extraction accuracy. - Human-in-the-Loop Feedback: Save human corrections that are injected into future distillation runs for the same subject.
- File-System Caching: Fast retrieval of previously synthesized context.
---
๐ง Intelligent Defaults
DocPulse is designed for a seamless, zero-config startup experience.
- Dynamic Model Selection: On launch, DocPulse detects your system's total RAM and automatically selects the most capable model from our curated DeepSeek-R1 Distill suite:
- < 24GB RAM:
7B(High-speed, minimal overhead). - 24GB - 64GB RAM:
14B(Deep extraction & reasoning). - > 64GB RAM:
32B(Maximum fidelity for complex standards). - Auto-Bootstrapping: The system automatically initializes your local config at
~/.config/docpulse/, creates the required cache directories, and downloads MLX model weights on demand. - Environment Configuration: We provide a comprehensive
.env.exampletemplate. Simplycp .env.example .envto manage optional search API keys (Brave/Google) or force a specific model size using theDOCPULSE_MODELoverride.
---
๐ ๏ธ Requirements
- Hardware: Apple Silicon (M1, M2, M3, M4).
- Software: Python 3.10+, uv recommended.
- Environment: macOS (optimized for unified memory).
---
๐ฆ Installation
- Clone the repository:
git clone https://github.com/your-username/docpulse.git
cd docpulse
- Setup with
uv:
uv sync
- Install
crawl4aidependencies:
uv run crawl4ai-setup
- Configure environment:
cp .env.example .env
# Edit .env with your preference/keys
- Run the CLI:
uv run docpulse get fastapi --source="https://fastapi.tiangolo.com/tutorial/"
---
โก Quick Start (Claude Desktop)
- Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh - Clone & Sync:
git clone https://github.com/your-username/docpulse.git && cd docpulse && uv sync
- Configure:
cp .env.example .env(Add search keys if desired) - Add to Claude: Add the config below to your
claude_desktop_config.json. - Start Coding: Ask Claude: "Analyze the FastAPI documentation for memory management patterns."
---
โ๏ธ Configuration
DocPulse features a robust, tiered configuration system.
1. Configuration Tiers
Settings are loaded in the following priority:
- User Config:
~/.config/docpulse/config.toml - Default Config:
config.default.toml(bundled with the repo)
2. Configurable Settings
You can override any of these keys in your config.toml:
[app]
name: The name of the MCP server.log_level: Set toDEBUG,INFO,WARNING, orERROR.
[harvester]
text_extensions: List of file extensions to include in recursive scans.encoding: Default encoding for local files (default:utf-8).
[distiller]
max_tokens: Maximum output length for the manifesto.temperature: LLM sampling temperature.
[prompts]
Prompts are no longer stored in the TOML configuration. Instead, they live in the prompts/ directory.
- Priority:
~/.config/docpulse/prompts/{name}.txt>prompts/{name}.txt. - Placeholder tokens:
{raw_text},{community_context},{human_feedback}.
[models.entries]
Maps model repository strings to minimum RAM requirements (GB).
3. Environment Variables (.env)
Used for sensitive keys and quick overrides:
DOCPULSE_MODEL: The pipeline strictly defaults to a 7B model because data extraction relies heavily on deletion/formatting rather than novel synthesis. Overtaxing VRAM with a 32B model causes severe bottlenecks. If you _must_ override this, set this variable to14B,32B, or a full HuggingFace repo link.DOCPULSE_CACHE_DIR: Set the directory where distilled documentation is saved (defaults to.docpulse_cachein the current working directory).BRAVE_API_KEY: For Brave Search augmentation.GOOGLE_API_KEY&GOOGLE_CSE_ID: For Google Search augmentation.- _Note: DuckDuckGo is the default and requires no key._
---
๐งฉ MCP Integration
Add to Claude Desktop
Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"docpulse": {
"command": "uv",
"args": ["--directory", "/path/to/docpulse", "run", "python", "server.py"]
}
}
}
---
๐งฐ Tools Exposed
get_universal_context
Primary tool for creating or retrieving documentation context.
- Arguments:
subject: Name (e.g.,fastapi).version: Version string (e.g.,v0.115).source: URL, absolute file path, or absolute directory path.topic_keywords: (Optional) Keywords for community search.
report_context_failure
Allows developers to correct the server's output.
- Arguments:
subject: The subject being corrected.feedback: Detailed workaround or bug fix.- _DocPulse will inject this feedback into the prompt the next time you request the same subject._
---
---
๐งช Testing
DocPulse provides several ways to test the server without requiring an LLM:
1. Dedicated CLI (On-Demand)
Run DocPulse directly from your terminal for one-off distillations:
# Get context for a subject
uv run docpulse get fastapi --source "https://fastapi.tiangolo.com/tutorial/"
# Report a failure or add feedback
uv run docpulse report fastapi "The async client has change in version 0.115"
2. Automated E2E Script
Run the provided E2E test suite which verifies CLI execution and cache persistence:
./scripts/test_e2e.sh
3. Visual Debugging (MCP Inspector)
Open the interactive MCP Inspector to test tools via a web UI:
uv run fastmcp dev server.py
4. Unit & Multi-Layer Tests
Run the standard pytest suite:
uv run pytest tests/ -v
---
๐พ Persistence & Survival
DocPulse is designed to survive reboots and server restarts without losing data.
- File-System Cache: All distilled context is saved to a local caching directory. By default, this is an operational
.docpulse_cache/directory in the current working directory. You can override this local folder using theDOCPULSE_CACHE_DIRenvironment variable. - Automatic Directory Management: The application will automatically ensure the caching directory exists before saving files to it, keeping things zero-configuration.
- Cache-First Logic: Before performing a new harvest or distillation, the server checks the caching directory. If a match is found, it returns the stored result instantly.
- Feedback Loop: Human feedback and failure reports are persisted in the
feedback/subdirectory of the cache and are automatically injected into future distillation prompts for that subject.
๐ค Community & Governance
- Contributing: We welcome contributions! See our guide.
- Code of Conduct: Our standards for a welcoming community.
- Security Policy: How to report vulnerabilities.
๐ Pull Requests We'd Love to See
- Platform Agnosticism: Currently, DocPulse is optimized for Apple Silicon via MLX. We invite PRs to support other backends (llama.cpp, ONNX, etc.) to make the system truly universal.
- Integration Plugins: Right now, DocPulse works best with direct API/Web access. We welcome PRs for plugins that integrate with specific documentation platforms (Confluence, Notion, SharePoint, etc.) where documentation often lives.
---
๐ License
MIT






