LLM Sentry GEO Marketing Skill

Skill by ara.so — Marketing Skills collection.

What This Project Does

LLM Sentry is an automated monitoring and analysis platform for GEO (Generative Engine Optimization). It tracks how AI search engines (DeepSeek, Doubao, Bocha) mention brands and analyze citation sources in their responses. The platform uses browser automation (Playwright) to simulate real user queries, extract AI responses, parse reference links, and calculate Share of Voice (SoV) metrics.

Core capabilities:

Real-time brand exposure monitoring in AI search results
Citation source extraction and domain analysis
Multi-round query execution for stability testing
Desktop client (Wails + Go + React) with local SQLite storage
RESTful API for task automation
PostgreSQL-based data persistence

Project Structure

This is a monorepo with three main components:

GEO/
├── geo_db/                  # PostgreSQL database service
├── llm_sentry_monitor/      # Python monitoring service (Playwright-based)
└── geo_client2/             # Desktop client (Wails + Go + React)

Installation

Prerequisites

Python 3.11+ with uv package manager
Docker and Docker Compose
PostgreSQL 15+ (via Docker)
For desktop client: Go 1.21+, Node.js 18+, Wails v2

Quick Setup with Makefile

# Clone the repository
git clone https://github.com/daijinma/geo_marketing.git
cd geo_marketing

# Initialize environment and install dependencies
make setup

# Start PostgreSQL database
make db

# Run monitoring service
make run

Manual Setup

# Create virtual environment with uv
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install Python dependencies
cd llm_sentry_monitor
uv pip install -e ".[dev]"

# Install Playwright browsers
playwright install chromium

# Start database
cd ../geo_db
docker-compose up -d

Desktop Client Setup

cd geo_client2

# Install frontend dependencies
cd frontend
pnpm install

# Install Go dependencies
cd ..
go mod tidy

# Run in development mode
wails dev

# Build for production
wails build

Database Schema

The system uses two main tables:

Records table (monitoring results):

CREATE TABLE records (
    id SERIAL PRIMARY KEY,
    query VARCHAR(500) NOT NULL,
    provider VARCHAR(50) NOT NULL,
    response_text TEXT,
    raw_citations JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

Citations table (extracted references):

CREATE TABLE citations (
    id SERIAL PRIMARY KEY,
    record_id INTEGER REFERENCES records(id),
    url TEXT NOT NULL,
    domain VARCHAR(255),
    title VARCHAR(500),
    sequence INTEGER,
    created_at TIMESTAMP DEFAULT NOW()
);

API Usage

Start API Server

# llm_sentry_monitor/main.py
from fastapi import FastAPI
from api.routes import router

app = FastAPI(title="LLM Sentry API")
app.include_router(router)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Start server:

cd llm_sentry_monitor
python main.py

Create Monitoring Task

POST /mock

curl -X POST http://localhost:8000/mock \
  -H "Content-Type: application/json" \
  -d '{
    "keywords": ["AI search engines", "generative search optimization"],
    "platforms": ["deepseek", "doubao"],
    "query_count": 3,
    "settings": {
      "headless": false,
      "timeout": 60000,
      "delay_between_tasks": 5
    }
  }'

Python example:

import requests

payload = {
    "keywords": ["AI搜索引擎对比", "DeepSeek vs ChatGPT"],
    "platforms": ["deepseek", "doubao"],
    "query_count": 2,
    "settings": {
        "headless": True,
        "timeout": 60000,
        "delay_between_tasks": 3
    }
}

response = requests.post("http://localhost:8000/mock", json=payload)
results = response.json()

for result in results:
    print(f"Query: {result['query']}")
    print(f"Platform: {result['provider']}")
    print(f"Citations: {len(result['citations'])}")

Provider Implementation Pattern

Each AI platform has a dedicated provider class:

# llm_sentry_monitor/providers/base.py
from abc import ABC, abstractmethod
from playwright.async_api import Page

class BaseProvider(ABC):
    def __init__(self, page: Page):
        self.page = page
    
    @abstractmethod
    async def search(self, query: str) -> dict:
        """Execute search and return response + citations"""
        pass
    
    @abstractmethod
    async def login(self) -> bool:
        """Handle authentication if needed"""
        pass

Example DeepSeek provider:

# llm_sentry_monitor/providers/deepseek.py
from .base import BaseProvider
import asyncio

class DeepSeekProvider(BaseProvider):
    BASE_URL = "https://chat.deepseek.com"
    
    async def search(self, query: str) -> dict:
        await self.page.goto(self.BASE_URL)
        
        # Enable search mode
        search_toggle = await self.page.wait_for_selector('[data-testid="search-toggle"]')
        await search_toggle.click()
        
        # Enter query
        input_box = await self.page.wait_for_selector('textarea[placeholder*="Ask"]')
        await input_box.fill(query)
        await input_box.press("Enter")
        
        # Wait for response
        await asyncio.sleep(5)
        response_text = await self.page.text_content('.response-content')
        
        # Extract citations
        citations = await self.page.locator('.citation-link').all()
        citation_data = []
        for idx, citation in enumerate(citations, 1):
            url = await citation.get_attribute('href')
            title = await citation.text_content()
            citation_data.append({
                "sequence": idx,
                "url": url,
                "title": title
            })
        
        return {
            "query": query,
            "provider": "deepseek",
            "response_text": response_text,
            "citations": citation_data
        }

Core Domain Extraction Logic

# llm_sentry_monitor/core/parser.py
import tldextract
from urllib.parse import urlparse

def extract_domain(url: str) -> str:
    """Extract clean domain from URL"""
    parsed = tldextract.extract(url)
    domain = f"{parsed.domain}.{parsed.suffix}"
    return domain

def calculate_sov(citations: list[dict]) -> dict:
    """Calculate Share of Voice by domain"""
    domain_counts = {}
    total = len(citations)
    
    for citation in citations:
        domain = extract_domain(citation['url'])
        domain_counts[domain] = domain_counts.get(domain, 0) + 1
    
    sov = {
        domain: {
            "count": count,
            "percentage": round(count / total * 100, 2)
        }
        for domain, count in domain_counts.items()
    }
    
    return sov

Database Operations

# llm_sentry_monitor/core/database.py
import psycopg2
import os
import json

class Database:
    def __init__(self):
        self.conn = psycopg2.connect(
            host=os.getenv("DB_HOST", "localhost"),
            port=os.getenv("DB_PORT", "5432"),
            user=os.getenv("DB_USER", "geo_user"),
            password=os.getenv("DB_PASSWORD"),
            database=os.getenv("DB_NAME", "geo_db")
        )
    
    def save_record(self, query: str, provider: str, response_text: str, citations: list) -> int:
        """Save monitoring record and return record_id"""
        with self.conn.cursor() as cur:
            cur.execute(
                """
                INSERT INTO records (query, provider, response_text, raw_citations)
                VALUES (%s, %s, %s, %s)
                RETURNING id
                """,
                (query, provider, response_text, json.dumps(citations))
            )
            record_id = cur.fetchone()[0]
            self.conn.commit()
            return record_id
    
    def save_citations(self, record_id: int, citations: list):
        """Save individual citations linked to record"""
        with self.conn.cursor() as cur:
            for citation in citations:
                domain = extract_domain(citation['url'])
                cur.execute(
                    """
                    INSERT INTO citations (record_id, url, domain, title, sequence)
                    VALUES (%s, %s, %s, %s, %s)
                    """,
                    (record_id, citation['url'], domain, citation.get('title'), citation['sequence'])
                )
            self.conn.commit()

Running Monitoring Tasks

Single Query Example

from playwright.async_api import async_playwright
from providers.deepseek import DeepSeekProvider
from core.database import Database

async def run_monitoring():
    db = Database()
    
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        page = await browser.new_page()
        
        provider = DeepSeekProvider(page)
        
        result = await provider.search("最好的AI搜索引擎推荐")
        
        # Save to database
        record_id = db.save_record(
            query=result['query'],
            provider=result['provider'],
            response_text=result['response_text'],
            citations=result['citations']
        )
        db.save_citations(record_id, result['citations'])
        
        await browser.close()
        print(f"Saved record #{record_id} with {len(result['citations'])} citations")

# Run
import asyncio
asyncio.run(run_monitoring())

Multi-Round Query Example

async def run_multi_round_monitoring(keywords: list, platforms: list, rounds: int):
    db = Database()
    
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        
        for keyword in keywords:
            for platform in platforms:
                page = await browser.new_page()
                
                # Get provider class
                if platform == "deepseek":
                    provider = DeepSeekProvider(page)
                elif platform == "doubao":
                    provider = DoubaoProvider(page)
                
                for round_num in range(1, rounds + 1):
                    print(f"Round {round_num}/{rounds}: {keyword} on {platform}")
                    
                    result = await provider.search(keyword)
                    record_id = db.save_record(
                        query=result['query'],
                        provider=result['provider'],
                        response_text=result['response_text'],
                        citations=result['citations']
                    )
                    db.save_citations(record_id, result['citations'])
                    
                    # Delay between rounds
                    await asyncio.sleep(5)
                
                await page.close()
        
        await browser.close()

Desktop Client (Wails) Backend

// geo_client2/backend/app.go
package backend

import (
    "context"
    "database/sql"
    _ "github.com/mattn/go-sqlite3"
)

type App struct {
    ctx context.Context
    db  *sql.DB
}

func NewApp() *App {
    return &App{}
}

func (a *App) startup(ctx context.Context) {
    a.ctx = ctx
    
    // Initialize SQLite
    db, err := sql.Open("sqlite3", "./geo_sentry.db")
    if err != nil {
        panic(err)
    }
    a.db = db
    
    // Create tables
    a.initDatabase()
}

func (a *App) CreateTask(keywords []string, platforms []string, queryCount int) (string, error) {
    // Insert task into database
    tx, _ := a.db.Begin()
    result, err := tx.Exec(`
        INSERT INTO tasks (keywords, platforms, query_count, status)
        VALUES (?, ?, ?, 'pending')
    `, strings.Join(keywords, ","), strings.Join(platforms, ","), queryCount)
    
    if err != nil {
        tx.Rollback()
        return "", err
    }
    
    tx.Commit()
    taskID, _ := result.LastInsertId()
    
    // Start monitoring in background
    go a.executeTask(taskID)
    
    return fmt.Sprintf("Task %d created", taskID), nil
}

Configuration

Environment Variables

Create .env file in llm_sentry_monitor/:

# Database
DB_HOST=localhost
DB_PORT=5432
DB_USER=geo_user
DB_PASSWORD=your_secure_password
DB_NAME=geo_db

# Playwright
HEADLESS_MODE=false
BROWSER_TIMEOUT=60000

# API
API_PORT=8000
DELAY_BETWEEN_TASKS=5

Settings File

# llm_sentry_monitor/config/settings.py
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    db_host: str = "localhost"
    db_port: int = 5432
    db_user: str = "geo_user"
    db_password: str
    db_name: str = "geo_db"
    
    headless_mode: bool = False
    browser_timeout: int = 60000
    delay_between_tasks: int = 5
    
    class Config:
        env_file = ".env"

settings = Settings()

Common Patterns

Pattern 1: Batch Keyword Monitoring

async def monitor_keyword_matrix(brands: list, question_templates: list):
    """Monitor multiple brands across question patterns"""
    keywords = []
    
    for brand in brands:
        for template in question_templates:
            keywords.append(template.format(brand=brand))
    
    await run_multi_round_monitoring(
        keywords=keywords,
        platforms=["deepseek", "doubao"],
        rounds=3
    )

Pattern 2: Citation Analysis Pipeline

def analyze_citations_by_brand(brand: str, days: int = 7):
    """Analyze citation sources for a brand over time"""
    db = Database()
    
    with db.conn.cursor() as cur:
        cur.execute("""
            SELECT c.domain, COUNT(*) as mentions
            FROM citations c
            JOIN records r ON c.record_id = r.id
            WHERE r.query LIKE %s
            AND r.created_at >= NOW() - INTERVAL '%s days'
            GROUP BY c.domain
            ORDER BY mentions DESC
        """, (f"%{brand}%", days))
        
        results = cur.fetchall()
        
    return [{"domain": row[0], "mentions": row[1]} for row in results]

Pattern 3: Share of Voice Calculation

def calculate_brand_sov(brands: list[str], query: str):
    """Calculate Share of Voice for multiple brands"""
    db = Database()
    
    with db.conn.cursor() as cur:
        cur.execute("""
            SELECT r.response_text
            FROM records r
            WHERE r.query = %s
            ORDER BY r.created_at DESC
            LIMIT 1
        """, (query,))
        
        response = cur.fetchone()[0]
    
    sov = {}
    for brand in brands:
        mentions = response.lower().count(brand.lower())
        sov[brand] = mentions
    
    total = sum(sov.values())
    return {
        brand: {
            "mentions": count,
            "percentage": round(count / total * 100, 2) if total > 0 else 0
        }
        for brand, count in sov.items()
    }

Troubleshooting

Issue: Playwright Browser Not Found

# Install browsers explicitly
playwright install chromium

# Or install all browsers
playwright install

Issue: Database Connection Failed

# Check database is running
docker ps | grep postgres

# Restart database
cd geo_db
docker-compose restart

# Check connection
psql -h localhost -U geo_user -d geo_db

Issue: Login Required During Headless Mode

First run in non-headless mode to complete login:

# In settings or API call
"settings": {
    "headless": false
}

After successful login, browser will save cookies. Then enable headless mode.

Issue: Citations Not Extracted

Check selector patterns in provider code:

# Debug citation extraction
citations = await page.locator('.citation-link').all()
print(f"Found {len(citations)} citations")

# Try alternative selectors
citations = await page.locator('a[data-citation]').all()

Issue: Rate Limiting or Blocking

Add delays between requests:

# In settings
"settings": {
    "delay_between_tasks": 10  # Increase to 10 seconds
}

# Or add random delays
import random
await asyncio.sleep(random.uniform(5, 15))

Makefile Commands

make setup    # Initialize environment with uv
make db       # Start PostgreSQL database
make run      # Run monitoring service
make test     # Run tests
make clean    # Stop database and clean cache
make help     # Show all available commands

Key Metrics to Track

Brand Mention Rate: Percentage of queries where brand appears
Citation Count: Number of reference links per query
Domain Distribution: Top domains cited by AI
Position Analysis: Average position of brand mentions
Share of Voice: Brand percentage vs. competitors
Stability Score: Consistency across multi-round queries

llm-sentry-geo-marketing