SEC-MCP

MCP server for analyzing SEC filings (10-K, 10-Q, 8-K) with industry-aware financial extraction and BERT-based NLP.

Features

Company Search — Look up companies by ticker or name via SEC EDGAR
Standardized Financials — Industry-aware XBRL extraction with ~250 concept mappings across 5 industry classes (standard, bank, insurance, REIT, utility)
Validation — Automatic sanity checks (revenue ≥ net income, accounting equation, segment vs total detection)
Filing Access — Fetch filing text and specific sections (Risk Factors, MD&A, etc.)
Sentiment Analysis — FinBERT financial sentiment (positive/negative/neutral)
Summarization — BART-based hierarchical summarization for long filing sections
Entity Extraction — NER for companies, people, locations + regex for monetary values, dates, percentages

Setup

# Clone
git clone https://github.com/YOUR_USERNAME/SEC-MCP.git
cd SEC-MCP

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install
pip install -e ".[dev]"

# Configure EDGAR identity (required by SEC)
cp .env.example .env
# Edit .env and set EDGAR_IDENTITY="Your Name your@email.com"

Available Tools

Base / Discovery

| Tool | Description | |------|-------------| | search_company | Search by ticker/name → CIK, ticker, SIC code, industry | | get_filing_list | List filings, filter by form type (10-K, 10-Q, 8-K) |

Financials (standardized, industry-aware, validated)

| Tool | Description | |------|-------------| | get_financials | Full standardized extraction: metrics, ratios, validation, opt. statements | | get_financials_batch | Same as above for N tickers in parallel | | get_income_statement | Just the income statement rows | | get_balance_sheet | Just the balance sheet rows | | get_cash_flow | Just the cash flow rows | | get_financial_ratios | Just computed ratios (margins, ROA, ROE, leverage, etc.) | | compare_companies | Side-by-side metrics + ratios for multiple tickers |

Filing Text

| Tool | Description | |------|-------------| | get_filing_text | Full filing or specific section text (supports aliases like 'risk factors') |

NLP Analysis

| Tool | Description | |------|-------------| | analyze_sentiment | FinBERT sentiment on text or filing section | | summarize_filing | Hierarchical BART summarization | | extract_entities | NER (ORG, PER, LOC, MONEY, DATE, PERCENT) | | analyze_filing | Combined sentiment + summary + entities in one call |

How financials extraction works

Industry detection

The SIC code is used to classify a company into one of 5 industry classes:

| Class | SIC Range | Revenue Strategy | |-------|-----------|------------------| | standard | Everything else | First match: Revenues, RevenueFromContractWithCustomer, SalesRevenueNet, … | | bank | 6020–6299 | Try total (Revenues, NetRevenues), then aggregate NII + non-interest + trading + fees | | insurance | 6310–6411 | Try total, then aggregate premiums + investment income + fees | | reit | 6500–6553 | Lease revenue + other income | | utility | 4900–4991 | Electric + gas utility revenue |

XBRL concept dictionary

xbrl_mappings.py maps ~250 XBRL concepts to 20+ standardized metrics. Each metric has an ordered list of concepts to try — earlier entries are preferred. Some entries are marked aggregate=True (sum all matching, used for multi-component revenue like banks).

Validation rules

Every extraction runs these checks:

revenue ≥ net income (when both positive) — catches segment-only revenue
Assets = Liabilities + Equity (within 5%) — catches mismatched concepts
Revenue not null — warns if no concept matched
Bank segment check — flags if bank revenue < 80% of net income
Gross margin 0–100% — for standard companies

Warnings are returned in the validation array so the AI can explain or retry.

Usage

Run as MCP server (STDIO)

python -m sec_mcp.server

Using with your app (Cursor, Claude Desktop, etc.)

Configure MCP so your app starts the SEC-MCP server (see below).
Set EDGAR_IDENTITY in .env or in the MCP server env.
The AI chooses the right tool per request:

"Apple's financials" → get_financials("AAPL")
"Compare AAPL vs MSFT vs GOOGL" → compare_companies(["AAPL","MSFT","GOOGL"])
"Morgan Stanley income statement" → get_income_statement("MS")
"What are Apple's risk factors?" → get_filing_text with section='risk factors'

Cursor / Claude Desktop configuration

{
  "mcpServers": {
    "sec-mcp": {
      "command": "python",
      "args": ["-m", "sec_mcp.server"],
      "cwd": "/path/to/SEC-MCP",
      "env": {
        "EDGAR_IDENTITY": "Your Name your@email.com"
      }
    }
  }
}

Configuration

| Variable | Default | Description | |----------|---------|-------------| | EDGAR_IDENTITY | SEC-MCP sec-mcp@example.com | Your identity for SEC EDGAR API | | SENTIMENT_MODEL | ProsusAI/finbert | Sentiment analysis model | | SUMMARIZATION_MODEL | facebook/bart-large-cnn | Summarization model | | NER_MODEL | dslim/bert-base-NER | NER model | | MAX_CHUNK_TOKENS | 512 | Max tokens per chunk | | CHUNK_OVERLAP_TOKENS | 128 | Overlap between chunks |

Architecture

src/sec_mcp/
├── server.py           # MCP tool definitions (14 tools)
├── edgar_client.py     # EDGAR API wrapper (company search, filings, text)
├── financials.py       # Standardized extraction engine + validation
├── xbrl_mappings.py    # XBRL concept → metric dictionary (5 industry classes)
├── models.py           # Pydantic models (StandardizedFinancials, ratios, etc.)
├── config.py           # Environment config
└── nlp/
    ├── sentiment.py    # FinBERT
    ├── summarizer.py   # BART
    └── ner.py          # NER

NLP Models

Models are lazy-loaded (downloaded on first use, ~2.5GB total):

ProsusAI/finbert — Financial sentiment, trained on SEC filings
facebook/bart-large-cnn — Abstractive summarization
dslim/bert-base-NER — Named entity recognition

Development

# Run tests
pytest

# Run tests (skip slow model tests)
pytest -m "not slow"

# Lint
ruff check src/ tests/

License

MIT

SEC-MCP

SEC-MCP

Features

Setup

Available Tools

Base / Discovery

Financials (standardized, industry-aware, validated)

Filing Text

NLP Analysis

How financials extraction works

Industry detection

XBRL concept dictionary

Validation rules

Usage

Run as MCP server (STDIO)

Using with your app (Cursor, Claude Desktop, etc.)

Cursor / Claude Desktop configuration

Configuration

Architecture

NLP Models

Development

License

Related MCP servers

MCP servers by category