DataDog MCP Server

A FastMCP server providing DataDog log search and monitoring capabilities through the Model Context Protocol (MCP).

Features

Service-Aware Log Search: Search DataDog logs with service-specific filtering for enhanced targeting
Tasmania Space Filtering: Filter tasmania service logs by space ID, user, or tenant
Meeting-Specific Debugging: Find all logs related to specific meetings (7-day default)
User Activity Tracking: Search user-related logs and activities across services
Webhook Event Analysis: Find integration webhook events for Zoom and Whereby
Error Detection: Search for recent errors across services with intelligent filtering
APM Trace Correlation: Find all logs for specific traces
STDIO Transport: Self-contained server for local MCP usage
Environment-Based Configuration: Secure API key management

Response Size Safeguards

The MCP server includes comprehensive safeguards to prevent overwhelming LLM contexts with massive log responses:

Automatic Protections

Response Size Limit: Responses are capped at 500KB total
Per-Log Limits: Individual log entries are limited to 10KB
Payload Filtering: Large request/response data is automatically truncated with summaries
Query Validation: Warns about overly broad searches before execution

Size Management Features

Smart Truncation: Large fields show truncated content with original size information
Early Termination: Processing stops when size limits are reached
Clear Messaging: Responses indicate when truncation occurs and why

Response Indicators

{
  "logs": [...],
  "response_size_bytes": 45120,
  "truncated": true,
  "truncation_reason": "Response size limit exceeded",
  "skipped_logs": 15,
  "recommendation": "Use more specific filters or pagination to reduce response size",
  "query_warnings": {
    "risk_level": "high",
    "warnings": ["Literal string search without specific field filters"],
    "recommendations": ["Add specific field filters like service:meeting"]
  }
}

Preview-Then-Execute Workflow (RECOMMENDED)

For large or uncertain queries, use the two-step preview workflow to avoid overwhelming responses:

Step 1: Preview the Query

# Preview using relative time (hours)
preview = preview_search(service="meeting", user_id=214413, hours=24, limit=100)

# Or preview using specific time range
preview = preview_search(
    service="meeting",
    user_id=214413,
    time_from="2025-01-15T10:00:00Z",
    time_to="2025-01-15T18:00:00Z",
    limit=100
)

Preview Response: ``json { "estimated_count": 150, "estimated_size_mb": 0.3, "sample_logs": [...], // 3 sample entries for structure "cache_id": "abc123-def456-...", "expires_in_seconds": 30, "execution_recommendation": "OK", "query_warnings": { "risk_level": "low", "warnings": [], "recommendations": ["Query looks well-targeted with service and user filters"] } } ``

Step 2: Execute or Refine

# If preview looks reasonable, execute with cache_id
if preview["execution_recommendation"] == "OK":
    results = search_logs(cache_id=preview["cache_id"])
else:
    # Refine with more specific structured filters
    refined_results = search_logs(
        service="meeting",
        meeting_id=136666,
        status="error",
        hours=6
    )

Best Practices

✅ Recommended Query Patterns

# BEST: Use structured filters with specific service (defaults to prod)
search_logs(service="meeting", meeting_id=136666, status="error", hours=2)

# Query staging environment
search_logs(service="meeting", env="staging", meeting_id=136666, status="error", hours=2)

# User-specific filtering across services
search_logs(service="tasmania", user_id=214413, hours=6)

# Space-specific filtering (Tasmania coaching spaces) in dev environment
search_logs(service="tasmania", env="dev", space_id=168565, hours=12)

# Actor email filtering (finds user activity across log formats)
search_logs(service="meeting", actor_email="user@example.com", hours=24)

# PREVIEW FIRST: For potentially large result sets
preview = preview_search(service="meeting", user_id=214413, hours=24)
if preview["execution_recommendation"] == "OK":
    results = search_logs(cache_id=preview["cache_id"])

# Preview staging environment
preview = preview_search(service="meeting", env="staging", user_id=214413, hours=24)

❌ Avoid These Patterns (Will Trigger Warnings)

# WRONG: Raw DataDog query strings (defeats smart filtering)
search_logs(query='env:prod "meeting_id:136666"', hours=24)

# TOO BROAD: No service or specific filters
search_logs(query="env:prod", hours=12)

# HIGH VOLUME: Long time range without targeted filters
search_logs(hours=48, limit=200)

Environment Filtering

All tools support environment filtering via the env parameter:

Default: prod (production environment)
Common values: staging, dev, test
Custom: Any environment name matching your DataDog setup

# Search production (default)
search_logs(service="meeting", meeting_id=136666)

# Search staging explicitly
search_logs(service="meeting", env="staging", meeting_id=136666)

# Search dev environment
search_logs(service="meeting", env="dev", meeting_id=136666)

# Test connection to specific environment
test_connection(env="staging")

Service-Specific Filtering

The DataDog MCP server automatically adapts filtering based on the service being queried. Each service has its own set of available filters:

Supported Services

| Service | Available Filters | Description | |---------|------------------|-------------| | tasmania | user_id, tenant_id, space_id | Coaching platform logs with space-based filtering | | meeting | user_id, tenant_id, meeting_id, path_id | Meeting service logs | | assessment | user_id, tenant_id, assessment_id | Assessment service logs | | integration | user_id, tenant_id, meeting_id, provider | Integration service logs |

Filter Examples

# Tasmania space-specific filtering (maps to path filtering)
search_logs(service="tasmania", space_id=168565, user_id=214413)

# Meeting service error tracking
search_logs(service="meeting", meeting_id=136666, status="error")

# Actor email filtering (finds user activity across different log formats)
search_logs(service="meeting", actor_email="user@example.com", hours=24)

# Integration provider filtering
search_logs(service="integration", provider="zoom", meeting_id=136666)

Installation

Requirements

Python 3.11+
DataDog API key and Application key
uv (recommended) or pip

Setup

Option 1: Install as Global CLI Tool (Recommended)

Install as a global CLI tool using uv:

   uv tool install git+https://github.com/everwise/torch-datadog-mcp.git

Or using pipx: ``bash pipx install git+https://github.com/everwise/torch-datadog-mcp.git ``

Or add to project dependencies: ``bash uv add git+https://github.com/everwise/torch-datadog-mcp.git ``

Set environment variables:

   export DD_API_KEY=your_api_key_here
   export DD_APP_KEY=your_app_key_here
   export DD_SITE=datadoghq.com  # Optional, default shown

Option 2: Local Development Setup

Clone the repository:

   git clone https://github.com/everwise/torch-datadog-mcp.git
   cd torch-datadog-mcp

Install dependencies:

   # Using uv (recommended)
   uv sync

   # Or using pip
   pip install -e .

Configure environment variables:

   # Create .env file with your DataDog credentials
   touch .env

Set your DataDog API keys in .env:

   DD_API_KEY=your_api_key_here
   DD_APP_KEY=your_app_key_here
   DD_SITE=datadoghq.com  # Optional, default shown

Getting DataDog API Keys

Go to DataDog Organization Settings > API Keys
Create or copy your API Key
Go to Application Keys
Create or copy your Application Key

Usage

Running the Server

# If installed from GitHub
datadog-mcp

# For local development with uv
uv run --project /path/to/torch-datadog-mcp datadog-mcp

# Alternative local development commands
uv run python -m datadog_mcp.server
python src/datadog_mcp/server.py

The server runs in STDIO mode by default, making it suitable for MCP clients.

Available Tools

The server provides 8 focused tools for DataDog log analysis. All tools support environment filtering via the env parameter (defaults to prod):

Core Search Tools

preview_search: Preview query size and count before execution (30s cache)
Supports: env, service, hours/time_from/time_to, filters
search_logs: Enhanced main search with service-aware filtering and size safeguards
Supports: env, service, hours/time_from/time_to, filters, pagination
get_trace_logs: Get all logs for a specific APM trace ID
Supports: trace_id, env, hours/time_from/time_to, pagination

Business Analysis Tools

search_business_events: Find business events across services
Supports: event_type, env, service, hours/time_from/time_to
trace_request_flow: Track requests across multiple services using correlation IDs
Supports: request_id, env, hours/time_from/time_to

Utility Tools

test_connection: Test DataDog API connectivity
Supports: env (tests connection for specific environment)
get_server_info: Get server configuration information
debug_configuration: Get detailed debugging information

Service-Specific Structured Filters

Use these structured filters with the search_logs tool (automatically maps to correct DataDog fields):

Tasmania Service:

user_id → Maps to current user context fields
tenant_id → Maps to current tenant context
space_id → Maps to space path filtering (/api/v1/spaces/{id}*)
actor_email → Maps to statement actor email fields

Meeting Service:

meeting_id → Maps to meeting ID field
user_id → Maps to user ID field
tenant_id → Maps to tenant ID field
path_id → Maps to notifiable ID (learning paths)
actor_email → Maps to events actor email

Assessment Service:

assessment_id → Maps to assessment ID field
user_id, tenant_id → Standard user/tenant filtering

Integration Service:

provider → Filter by integration provider (zoom, whereby)
meeting_id → Maps to meeting ID field
user_id, tenant_id → Standard user/tenant filtering

Claude Desktop Integration

Add this configuration to your Claude Desktop config file:

macOS

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

For Global Tool Installation (Recommended)

{
  "mcpServers": {
    "datadog": {
      "command": "datadog-mcp",
      "env": {
        "DD_API_KEY": "your_api_key_here",
        "DD_APP_KEY": "your_app_key_here",
        "DD_SITE": "datadoghq.com"
      }
    }
  }
}

For Local Development Setup

{
  "mcpServers": {
    "datadog": {
      "command": "uv",
      "args": ["run", "--project", "/path/to/torch-datadog-mcp", "datadog-mcp"],
      "env": {
        "DD_API_KEY": "your_api_key_here",
        "DD_APP_KEY": "your_app_key_here",
        "DD_SITE": "datadoghq.com"
      }
    }
  }
}

Windows

Edit %APPDATA%\\Claude\\claude_desktop_config.json with similar configuration.

Alternative: Using Environment Variables

If you set environment variables globally, you can omit the env section:

{
  "mcpServers": {
    "datadog": {
      "command": "datadog-mcp"
    }
  }
}

Example Usage Patterns

Debug Meeting Issues

# Find all logs for a specific meeting (prod by default)
search_logs(service="meeting", meeting_id=136666, hours=24)

# Focus on errors for that meeting
search_logs(service="meeting", meeting_id=136666, status="error", hours=168)

# Debug staging environment issues
search_logs(service="meeting", env="staging", meeting_id=136666, status="error", hours=24)

Find Integration Events

# Zoom integration events for a meeting
search_logs(service="integration", provider="zoom", meeting_id=136667)

# All integration activity for a user in dev environment
search_logs(service="integration", env="dev", user_id=214413, hours=48)

Monitor Service Health

# Recent errors in meeting service (prod)
search_logs(service="meeting", status="error", hours=2)

# Tasmania space-specific errors in staging
search_logs(service="tasmania", env="staging", space_id=168565, status="error", hours=6)

User Activity Tracking

# All activity for a user in Tasmania (prod)
search_logs(service="tasmania", actor_email="user@example.com", hours=24)

# User's meeting activity in specific environment
search_logs(service="meeting", env="staging", user_id=214413, hours=12)

APM Trace Investigation

# Find all logs for a trace using relative time (prod by default)
get_trace_logs(trace_id="1234567890abcdef", hours=1)

# Find all logs for a trace in staging environment
get_trace_logs(
    trace_id="1234567890abcdef",
    env="staging",
    time_from="2025-01-15T14:00:00Z",
    time_to="2025-01-15T15:00:00Z"
)

# Track a request across services with relative time
trace_request_flow(request_id="req_abc123", hours=2)

# Track a request in dev environment with specific time range
trace_request_flow(
    request_id="req_abc123",
    env="dev",
    time_from="now-30m",
    time_to="now"
)

Query Architecture

The server uses structured filters that automatically map to the correct DataDog fields:

Structured Filter Benefits

Smart Field Mapping: user_id=214413 maps to the right field(s) per service
OR Conditions: Automatically searches multiple possible field locations
Health Check Exclusion: Removes noise from health check endpoints
Service-Aware: Each service has optimized field mappings

Common Structured Patterns

# Service + specific entity
search_logs(service="meeting", meeting_id=136666)

# User activity across services
search_logs(service="tasmania", user_id=214413)

# Status filtering with context
search_logs(service="meeting", meeting_id=136666, status="error")

# Actor-based filtering (email)
search_logs(service="meeting", actor_email="user@example.com")

Time Range Specification

All time-based tools support two mutually exclusive approaches for specifying time ranges:

Option 1: Relative Time (Using `hours`)

# Search last 24 hours (default for search_logs)
search_logs(service="meeting", meeting_id=136666, hours=24)

# Search last 2 hours
search_logs(service="meeting", status="error", hours=2)

Option 2: Specific Time Range (Using `time_from` and `time_to`)

# Specific ISO timestamp range
search_logs(
    service="meeting",
    meeting_id=136666,
    time_from="2025-01-15T10:00:00Z",
    time_to="2025-01-15T12:00:00Z"
)

# Using DataDog's relative syntax
search_logs(
    service="meeting",
    time_from="now-6h",
    time_to="now-2h"
)

# Mix of formats
search_logs(
    service="tasmania",
    time_from="2025-01-15T09:00:00Z",
    time_to="now"
)

Default Time Ranges (when neither is specified)

General log search: 1 hour
Trace logs: 1 hour
Business events: 1 hour
Request flow tracing: 1 hour

Note: The hours and time_from/time_to parameters are mutually exclusive. Using both will result in an error.

Development

Running Tests

uv run pytest

Code Formatting

uv run ruff format
uv run ruff check

Project Structure

torch-datadog-mcp/
├── src/
│   └── datadog_mcp/
│       ├── __init__.py
│       ├── server.py        # FastMCP server with tools
│       ├── client.py        # DataDog API client
│       └── filter_config.py # Service-specific filter configuration
├── pyproject.toml           # Project configuration
├── .env                     # Environment variables (create this)
├── README.md                # This file
└── datadog-mcp.fastmcp.json # FastMCP configuration

Troubleshooting

Authentication Issues

Verify DD_API_KEY and DD_APP_KEY are set correctly
Check your DataDog site setting (DD_SITE)
Ensure keys have appropriate permissions for log search

Connection Problems

Use the test_connection tool to verify API connectivity
Check network connectivity to DataDog endpoints
Verify your DataDog organization has log access enabled

No Results Found

Check time ranges (use longer periods for older data)
Verify query syntax matches DataDog's log search format
Ensure the environment/service filters match your data

License

MIT License - see LICENSE file for details.

torch-datadog-mcp