Ecommerce Catalog Agent

A conversational AI agent that answers product questions over an online store's catalog. It understands natural-language queries, finds matching products via hybrid search, and always reports live price and availability validated against the database.

Built as a tool-calling (ReAct) agent with a strict trust boundary: the model decides what to say and which products to show, but code owns the customer-facing numbers — so a hallucinated or injected price can never reach the user.

Features

Hybrid retrieval — BM25 (keyword) + vector embeddings (semantic) + Reciprocal

Rank Fusion + cross-encoder reranking. Catches word forms and synonyms that exact matching misses (e.g. "взуття для бігу" → running shoes).

Filter-first — hard constraints (price, stock) are applied in SQL to build the

candidate set before semantic ranking, avoiding the classic "top-k then filter → zero results" trap.

Two sources of truth — PostgreSQL is authoritative for volatile fields

(price/stock); the vector index is a search cache only. Price and stock are re-fetched live before every answer.

Structured-output contract — the agent finishes by calling present_results

with product SKUs + prose; price/stock are filled by code from live SQL. The model has no field to write a number into → containment against hallucination and prompt injection ("attacker needs capability, not just instruction").

Bounded agent loop — independent stoppers (max iterations, token budget,

latency) plus deterministic, score-based escalation to a human operator (on the reranker confidence, never the model's self-report).

Conversation memory — per-session history for multi-turn context.
Custom MCP server — the catalog tools are exposed over the Model Context

Protocol, so one contract serves the agent, an internal copilot, and Claude Desktop.

Multi-channel — a FastAPI /chat service, a Telegram bot via n8n (webhook),

and a standalone aiogram bot (long-polling).

Eval harness — a golden set scored with Recall@K / MRR to catch retrieval

regressions with numbers, not vibes.

Architecture

  Customer channels  (Telegram / web / Claude Desktop)
            │
        [n8n]  webhook intake + routing ── low confidence ──► human operator
            │
        [FastAPI /chat]  models warmed at startup
            │
        [ReAct agent loop]  bounded: max_iter / budget / latency
            │   parse → retrieve → validate → respond
            ▼
        [catalog tools]  (also exposed as a custom MCP server)
          search_products  → hybrid BM25 + vector + rerank, filter-first
          get_live_price / check_stock  → live SQL
            │
   PostgreSQL (price/stock = truth)   +   Chroma (search cache)
            ▲
   n8n schedule: XML feed → parse → upsert → re-embed

Tech stack

Python · FastAPI · OpenAI (LiteLLM-swappable) · PostgreSQL · Chroma · BM25 · sentence-transformers · cross-encoder reranker · custom MCP server · n8n · aiogram

Quick start

pip install -r requirements.txt
cp .env.example .env            # fill OPENAI_API_KEY + PG_*

# create the schema, load the sample feed (builds the hybrid index)
psql -d catalog -f schema.sql
python ingest.py

# ask from the CLI
python agent.py "червоні кросівки до 2000 в наявності"

# or run the HTTP service
uvicorn api:app --port 8000     # → http://localhost:8000/docs

# or the Telegram bot (set TELEGRAM_BOT_TOKEN in .env)
python bot.py

Project layout

| File | What | |---|---| | agent.py | ReAct agent loop, bounded stoppers, structured-output contract | | retrieval.py | hybrid retrieval (BM25 + vector + RRF + cross-encoder) + confidence scores | | catalog_tools.py | read-only catalog tools + structured get_facts for live validation | | server.py | the catalog tools exposed as a custom MCP server | | api.py | FastAPI /chat service (per-session memory, warmup at startup) | | bot.py | standalone Telegram bot (aiogram, long-polling) | | ingest.py | XML feed → PostgreSQL + rebuild the hybrid index | | eval.py | retrieval eval on a golden set (Recall@K / MRR) | | n8n/workflow.json | Telegram → /chat → reply + escalation routing |

Design notes

Why hybrid, not pure vector — vector search alone can't honor exact filters

(price/stock) or exact tokens (SKUs, model codes); BM25 + structured SQL cover what embeddings miss.

Why the vector index is never the source of price/stock — it's rebuilt on a

schedule, so its copy of volatile fields is stale by design; the answer always re-validates against SQL.

Why MCP — the catalog tools are reused across consumers (the agent, an internal

copilot, Claude Desktop): one contract, many clients.

---

The sample catalog and prompts are in Ukrainian; the agent replies in the customer's language.

ecommerce-catalog-agent