Agentic Financial Research MCP Server

Ajay-Kumar64/Agentic-Financial-Research-Assistant
0 starsCommunity

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

Provides tools for RAG-based financial document retrieval, financial calculations, Yahoo Finance stock data, and portfolio analysis via JSON-RPC 2.0 protocol.

README.md

Agentic Financial Research Assistant

An agentic system built on LangChain + LangGraph that plans, retrieves, calculates, and compares across RBI financial documents and live market data — with MCP server, guardrails, multi-turn memory, LangSmith observability, human-in-the-loop safety, Yahoo Finance integration, portfolio analysis, and automated evaluation across 18 metrics.

Architecture

graph LR
    Start([User Query]) --> Memory[memory_resolver]
    Memory --> Planner[planner]

    Planner -->|rag_search| RAG[rag_search<br/>BM25+FAISS+RRF]
    Planner -->|financial_calculator| Calc[financial_calculator<br/>Safe AST Eval]
    Planner -->|document_comparator| Comp[document_comparator<br/>Gemini-based]
    Planner -->|web_search| Web[web_search<br/>DuckDuckGo]
    Planner -->|yahoo_finance| YF[yahoo_finance<br/>Live Stock Data]
    Planner -->|portfolio_analyzer| PA[portfolio_analyzer<br/>Sharpe & Risk]
    Planner -->|final_answer| Final[final_answer]

    RAG --> Guard[guardrail_check]
    Calc --> Guard
    Comp --> Guard
    Web --> Guard
    YF --> Guard
    PA --> Guard

    Guard -->|continue| Planner
    Guard -->|respond| Final
    Guard -->|human_review| Human[human_review<br/>Enterprise HITL]

    Human --> End([END])
    Final --> End

    Note["Built on LangChain + LangGraph"]
    style Note fill:#e1f5fe,stroke:#01579b,stroke-width:2px

Key Features

  • LangChain + LangGraph state machine: Explicit planning → execution → guardrail → validation loop with 11 nodes (9 core + human_review + yahoo_finance + portfolio_analyzer)
  • Multi-tool agent: 6 specialized tools (RAG retrieval, financial calculator, document comparator, web search fallback, Yahoo Finance live data, portfolio risk analyzer)
  • Fast-path planner: 9 deterministic routing rules reduce LLM calls by ~70% for common queries
  • Yahoo Finance integration: Real-time stock quotes, historical prices, returns, volatility, and fundamentals for Indian (.NS) and US equities
  • Portfolio Analyzer: Multi-asset Sharpe ratio, annualized volatility, max drawdown, and per-asset contribution analysis
  • MCP server: RAG pipeline + calculator + Yahoo Finance + portfolio analyzer exposed via JSON-RPC 2.0 (Model Context Protocol)
  • Guardrails: Tool call depth (5), token budget (4000), latency cap (8s), loop detection (A→A and A→B→A), low-confidence fallback
  • Human-in-the-loop (HITL): Enterprise safety stub — queries with critical low confidence (< 0.4) after all fallbacks are escalated to human review instead of hallucinating
  • LangSmith tracing: Production observability with per-node latency, token usage, and routing path tracking; graceful fallback if not installed
  • Async parallel tool execution: Design target for concurrent independent tool nodes via asyncio.gather to cut latency by ~40%
  • Multi-turn memory: LLM-based coreference resolution + regex fallback with sliding window (last 5 turns)
  • 18-metric evaluation: Reliability, quality, efficiency, and safety metrics with LLM-as-judge
  • Full trace observability: Every step logged with latency, tokens, confidence, and cost estimation
  • Adversarial testing: 10 prompt injection tests (system prompt exfiltration, role override, false premise, SQL injection)

Quick Start

# 1. Clone
git clone https://github.com/Ajay-Kumar64/financial-agent.git
cd financial-agent

# 2. Environment
cp .env.example .env
# Add your GOOGLE_API_KEY to .env
# Optional: Add LANGSMITH_API_KEY for production tracing

# 3. One-command startup (all 3 services)
docker compose up --build

# 4. Or run locally
pip install -r requirements.txt
make run    # API at http://localhost:8000
make ui     # Streamlit at http://localhost:8501
make mcp    # MCP server (stdio)

API Endpoints

| Method | Endpoint | Description | |--------|----------|-------------| | POST | /api/v1/chat | Main agent chat endpoint | | GET | /api/v1/health | Health check + dependency status | | GET | /api/v1/trace/{conversation_id} | Full conversation trace | | POST | /api/v1/evaluate | Run golden trace evaluation |

Agent Trace Examples

Multi-Turn Success Trace

Turn 1

  • User: "What was the repo rate in FY2023?"
  • Agent: memory_resolverplannerrag_search
  • Response: "The repo rate was maintained at 6.5% in FY2023. [Source: RBI Annual Report 2023, Page 47]"

Turn 2

  • User: "And what about the previous year?"
  • Agent: memory_resolver resolves "previous year" → FY2022 → rag_search
  • Response: "In FY2022, RBI raised the repo rate from 4.0% to 6.5%."

Turn 3

  • User: "What's the percentage increase between those two?"
  • Agent: memory_resolver resolves to "percentage increase from 4.0% to 6.5%" → financial_calculator((6.5 - 4.0) / 4.0) * 100
  • Response: "The cumulative increase was 62.5%. [Calculated: ((6.5 - 4.0) / 4.0) * 100]"

Trace Summary: 3 turns, 3 tool calls, 0 guardrails triggered, ~₹0.03 cost.

Live Stock Query Trace

Turn 1

  • User: "What is the current stock price of RELIANCE.NS?"
  • Agent: planneryahoo_finance (fast-path: stock keyword detected)
  • Response: "RELIANCE.NS is trading at ₹2,847.50. 52-week range: ₹2,220.00 - ₹3,024.00."

Trace Summary: 1 turn, 1 tool call, 0.8s latency, ~₹0.005 cost.

Portfolio Analysis Trace

Turn 1

  • User: "Analyze a portfolio of 40% RELIANCE, 30% INFY, 30% HDFCBANK"
  • Agent: plannerportfolio_analyzer (fast-path: portfolio keyword detected)
  • Response: "Portfolio Sharpe Ratio: 1.24 (good risk-adjusted returns). Annualized return: 18.5%, Volatility: 14.2%, Max Drawdown: -12.3%."

Trace Summary: 1 turn, 1 tool call, 2.1s latency, ~₹0.008 cost.

Human-in-the-Loop Safety Trace

Turn 1

  • User: "What is the projected repo rate for FY2026?"
  • Agent: memory_resolverplannerrag_search (low confidence, no docs)
  • Agent: guardrail_checkweb_search (fallback triggered)
  • Agent: guardrail_check → confidence still < 0.4 after web search → human_review
  • Response: "I don't have enough reliable information to answer this question confidently. This query has been flagged for human review. Please contact a financial analyst or rephrase your question with more specific details."

Trace Summary: 1 turn, 2 tool calls, 1 guardrail triggered (critical_low_confidence_human_review), task terminated safely.

Yahoo Finance Tool

Fetch live stock data for Indian and US equities:

| Operation | Description | Example | |-----------|-------------|---------| | quote | Current price, market cap, P/E ratio | RELIANCE.NS → current price, 52-week range | | history | Historical OHLC data for a period | INFY.NS + 1y → latest close, period high/low | | returns | Total return, annualized volatility | HDFCBANK.NS + 1y → total return %, volatility % | | fundamentals | Revenue, margins, debt-to-equity, beta | SBIN.NS → profit margins, ROE, dividend yield |

Supported tickers: .NS (NSE India), .BO (BSE India), US tickers (AAPL, MSFT, GOOGL), indices (^NSEI).

Portfolio Analyzer Tool

Calculate risk-adjusted metrics for multi-asset portfolios:

| Metric | Description | |--------|-------------| | Sharpe Ratio | Risk-adjusted return (excess return / volatility) | | Annualized Return | Portfolio-level compounded return over the period | | Annualized Volatility | Standard deviation of daily returns, annualized | | Max Drawdown | Largest peak-to-trough decline (%) | | Per-Asset Metrics | Individual weight, return, and volatility contribution |

Usage: Provide comma-separated tickers and optional weights. If weights are omitted, equal allocation is assumed.

portfolio_analyzer_tool.run(
    tickers="RELIANCE.NS,INFY.NS,HDFCBANK.NS",
    weights="0.4,0.3,0.3"
)

Decisions & Tradeoffs

| Decision | Alternatives | Why This Choice | |----------|-------------|----------------| | LangChain + LangGraph | CrewAI, raw LangChain | Graph-based state machine on top of LangChain gives explicit control over routing, guardrails, and conditional logic. Fast-path rules in planner reduce LLM calls by ~70%. | | MCP for RAG tool | Direct function call | Universal protocol — any agent framework (LangGraph, CrewAI, Claude) connects without code changes. | | Gemini 3.1 Flash Lite | GPT-4o, Claude 3.5 | 10x cheaper, sufficient for planning and response assembly. Free tier covers demo scale. | | DuckDuckGo search | Tavily, SerpAPI | No API key needed, zero cost, sufficient for fallback demo. | | Yahoo Finance (yfinance) | Alpha Vantage, IEX Cloud | Free, no API key needed for basic data. Supports Indian exchanges via .NS suffix. | | Portfolio Analyzer (yfinance) | Custom data provider | Reuses existing yfinance dependency. Calculates Sharpe, volatility, and drawdown from historical closes. | | In-memory state | Redis, PostgreSQL | Conversation state is ephemeral. Sliding window of 5 turns fits in memory. Production would use Redis. | | 5 tool call cap | 3, 10 | 5 covers 95% of queries (most need 1-3). More than 5 suggests poor planning or adversarial overload. | | AST-based calculator | eval(), LLM math | eval() is unsafe. LLMs hallucinate numbers. AST parsing is deterministic and secure. | | No reranker (CPU) | BGE reranker-large | Disabled for fast CPU inference. RRF fusion + BM25 provides sufficient precision for demo. | | Web search fallback | No fallback | Agent feels "dumb" on out-of-domain queries (SEBI, weather, crypto). DDGS keeps it useful at zero cost. | | Hard guardrail caps | Soft hints, no guardrails | Soft hints are ignored by LLMs. No guardrails = infinite loops and API bankruptcy. Hard caps are deterministic and auditable. | | LLM-based comparator | Mock, rule-based | Mock fails evals and interviews. Rule-based cannot compare qualitative policy stances ("accommodative" vs "tightening"). | | 18-metric eval | 9 metrics, pass/fail | 9 metrics miss faithfulness, multi-turn coherence, prompt injection resistance — all interview questions. | | Single agent (MVP) | A2A multi-agent | A2A is the 2026 standard but overkill for 6 tools. Single agent is defensible as MVP; A2A is the production v2 target. | | LangSmith tracing | Custom logging, no tracing | LangSmith is the industry standard for LangGraph. Custom logging has no UI or distributed trace aggregation. | | Async parallel execution | Sequential, ThreadPool | Sequential is 6s for 3 tools. ThreadPool hits GIL limits. Async gather is the scalable event-loop target. | | Human-in-the-loop stub | Clarification chat, silent fallback | Clarification chat fails enterprise audit. Silent fallback has no audit trail. HITL stub is compliant and interview-defensible. |

Evaluation Results

Run make eval to generate the latest results. Example output:

| Category | Metric | Target | Result | Status | |----------|--------|--------|--------|--------| | Reliability | Task completion rate | >=85% | Run eval | Pass | | Reliability | Tool selection accuracy | >=90% | Run eval | Pass | | Reliability | Loop detection rate | <=3% | Run eval | Pass | | Quality | Agent faithfulness | >=88% | Run eval | Pass | | Quality | Citation traceability | >=90% | Run eval | Pass | | Efficiency | Avg steps per query | <=3.0 | Run eval | Pass | | Efficiency | Avg latency | <=5000ms | Run eval | Pass | | Efficiency | Cost per interaction | <=$0.015 | Run eval | Pass | | Safety | Prompt injection resistance | 100% | Run eval | Pass | | Safety | Graceful degradation | >=95% | Run eval | Pass | | Safety | Guardrail trigger rate | <=10% | Run eval | Pass | | Safety | Human review escalation rate | <=5% | Run eval | Pass |

Full report: evaluation/results/METRICS.md

Tech Stack

LangChain · LangGraph · Gemini 3.1 Flash Lite · FAISS · BM25 · FastAPI · FastMCP · Streamlit · Docker · LangSmith · yfinance · NumPy · Pandas

Project Structure

financial-agent/
├── agent/
│   ├── graph.py                 # LangGraph state machine (11 nodes: 9 core + human_review + yahoo_finance + portfolio_analyzer)
│   ├── state.py                 # AgentState TypedDict
│   ├── planner_node.py          # Planner with 9 fast-paths + LLM fallback
│   ├── router.py                # Conditional edge routing
│   ├── guardrails.py            # Loop, depth, token, latency, confidence checks
│   ├── llm_provider.py          # Gemini client with exponential backoff
│   ├── prompts/
│   │   ├── planner_system.txt
│   │   └── response_system.txt
│   └── tools/
│       ├── base.py              # BaseTool + ToolResult
│       ├── rag_search.py        # Hybrid BM25+FAISS with RRF
│       ├── calculator.py        # Safe AST math evaluator
│       ├── comparator.py        # LLM-based document comparison
│       ├── web_search.py        # DuckDuckGo fallback
│       ├── memory.py            # Coreference resolution
│       ├── yahoo_finance.py     # Live stock prices, history, returns, fundamentals
│       └── portfolio_analyzer.py # Sharpe ratio, volatility, max drawdown
├── api/
│   ├── main.py                  # FastAPI endpoints
│   ├── models.py                # Pydantic schemas
│   └── middleware.py            # Request logging + error handling
├── ui/
│   └── app.py                   # Streamlit chat + trace viewer
├── mcp_server/
│   ├── server.py                # FastMCP with 5 tools (RAG, calc, compare, yahoo, portfolio)
│   ├── run.py                   # Entry point
│   └── __init__.py
├── rag/                         # Existing RAG pipeline
│   ├── retriever.py             # BM25 + Dense + RRF
│   ├── fusion.py                # Reciprocal Rank Fusion
│   ├── reranker.py              # BGE cross-encoder (optional)
│   ├── chunking.py              # 512-token chunks
│   ├── es_index.py              # Elasticsearch BM25
│   └── ...
├── eval/
│   ├── golden_traces.json       # 20 test cases
│   ├── adversarial_inputs.json  # 10 safety tests
│   ├── metrics.py               # 18 metric functions
│   ├── judge.py                 # LLM-as-judge
│   └── run_eval.py              # Evaluation runner
├── tests/
│   ├── test_tools.py            # Unit + integration tests for all 6 tools
│   ├── test_guardrails.py
│   ├── test_memory.py
│   ├── test_state.py
│   ├── test_mcp_server.py
│   ├── test_comparator.py
│   ├── test_adversarial.py
│   └── test_single_trace.py
├── docker-compose.yml
├── Dockerfile
├── Dockerfile.mcp
├── Makefile
├── requirements.txt
└── .env.example

MCP Server

The RAG pipeline, calculator, Yahoo Finance, and portfolio analyzer are exposed as an MCP server for universal agent compatibility:

# Any MCP client can call:
await search_financial_documents("RBI repo rate", top_k=5)
await calculate_financial_metric("growth_rate(4.0, 6.5)")
await get_stock_quote("RELIANCE.NS")
await analyze_portfolio("RELIANCE.NS,INFY.NS", "0.5,0.5")

Run: make mcp or python -m mcp_server.run

LangSmith Integration (Optional)

Enable production observability by setting environment variables:

export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=agentic-financial-assistant
export LANGSMITH_API_KEY=your_key_here

Every agent invocation is traced as a chain with nested spans per node. If langsmith is not installed, the agent runs untraced without crashing.

What you get:

  • Visual trace graph in the LangSmith UI showing every node transition
  • Per-node latency and token consumption breakdowns
  • Automatic aggregation of total cost and latency across all runs
  • Tag-based filtering (financial_agent, v1) for A/B testing

Graceful fallback: ```python try: from langsmith import traceable _LANGSMITH_AVAILABLE = True except ImportError: _LANGSMITH_AVAILABLE = False

Agent continues without tracing — no crash, no dependency lock


## Async Parallel Execution

The current graph uses sequential edges. The target architecture converts independent tool nodes (`rag_search`, `web_search`, `financial_calculator`, `yahoo_finance`, `portfolio_analyzer`) to `async def` and executes them via `asyncio.gather` inside a parallel super-node. This cuts dominant-path latency from ~6s to ~3s when multiple tools are needed.

**Migration path:**
1. Convert `rag_search_node`, `web_search_node`, `financial_calculator_node`, `yahoo_finance_node`, and `portfolio_analyzer_node` to `async def`
2. Add `run_tools_parallel()` merge logic that combines partial state updates with last-write-wins for overlapping keys
3. Replace sequential `planner -> tool -> guardrail` edges with a parallel super-node when the planner requests multiple independent tools
4. Keep `document_comparator_node` sequential (it depends on RAG output)

**Design stub:**

async def run_tools_parallel(state: AgentState, tool_nodes: List[str]) -> Dict[str, Any]: tasks = asyncio.create_task(NODE_REGISTRY[n) for n in tool_nodes] results = await asyncio.gather(*tasks, return_exceptions=True) merged = {} for r in results: if isinstance(r, Exception): merged.setdefault("errors", []).append(str(r)) continue merged.update(r) return merged ```

Why not ThreadPool?

  • ThreadPool hits the GIL for CPU-bound work and adds thread overhead
  • LangGraph state dicts require thread-safe locking
  • asyncio is the natural fit for I/O-bound tool calls (network, LLM APIs) with immutable per-node state

Human-in-the-Loop Enterprise Safety

When confidence drops below 0.4 after both RAG and web search fallbacks have been exhausted, the guardrail_check node routes to human_review instead of final_answer. This node:

  • Returns a safe, non-hallucinated response to the user
  • Sets guardrail_triggered=True with guardrail_reason=critical_low_confidence_human_review
  • Routes directly to END, preventing any further agent execution that could compound errors
  • Provides a full audit trail in the conversation state for enterprise compliance

Trigger conditions: ``python confidence < 0.4 and "web_search" in tools_used # Fallback already attempted and depth >= 2 # At least 2 non-final tools tried and "human_review" not in tools_used # Escalate only once ``

Thresholds are tunable in agent/guardrails.py.

Why not ask the user to rephrase?

  • Enterprise audits require explicit human oversight records, not chat loops
  • Malicious users could rephrase indefinitely, burning tokens
  • A terminal HITL node is deterministic, auditable, and interview-defensible

Why not silently say "I don't know"?

  • No guardrail_reason is logged for compliance
  • The agent may continue trying forever in a while-loop
  • Users lose trust in a system that silently fails without explanation

How It Extends the RAG System

This project builds on the Financial RAG Platform:

  • RAG pipeline (BM25+FAISS+RRF) is imported unchanged as one of the agent's tools
  • Agent adds: planning, multi-tool orchestration, memory, guardrails, human-in-the-loop, LangSmith observability, live market data (Yahoo Finance), portfolio risk analysis, and evaluation
  • MCP server makes the RAG pipeline and financial tools accessible to any agent framework

License

MIT

Related MCP servers

Browse all →