Ecommerce Catalog Agent
A conversational AI agent that answers product questions over an online store's catalog. It understands natural-language queries, finds matching products via hybrid search, and always reports live price and availability validated against the database.
Built as a tool-calling (ReAct) agent with a strict trust boundary: the model decides what to say and which products to show, but code owns the customer-facing numbers — so a hallucinated or injected price can never reach the user.
Features
- Hybrid retrieval — BM25 (keyword) + vector embeddings (semantic) + Reciprocal
Rank Fusion + cross-encoder reranking. Catches word forms and synonyms that exact matching misses (e.g. "взуття для бігу" → running shoes).
- Filter-first — hard constraints (price, stock) are applied in SQL to build the
candidate set before semantic ranking, avoiding the classic "top-k then filter → zero results" trap.
- Two sources of truth — PostgreSQL is authoritative for volatile fields
(price/stock); the vector index is a search cache only. Price and stock are re-fetched live before every answer.
- Structured-output contract — the agent finishes by calling
present_results
with product SKUs + prose; price/stock are filled by code from live SQL. The model has no field to write a number into → containment against hallucination and prompt injection ("attacker needs capability, not just instruction").
- Bounded agent loop — independent stoppers (max iterations, token budget,
latency) plus deterministic, score-based escalation to a human operator (on the reranker confidence, never the model's self-report).
- Conversation memory — per-session history for multi-turn context.
- Custom MCP server — the catalog tools are exposed over the Model Context
Protocol, so one contract serves the agent, an internal copilot, and Claude Desktop.
- Multi-channel — a FastAPI
/chatservice, a Telegram bot via n8n (webhook),
and a standalone aiogram bot (long-polling).
- Eval harness — a golden set scored with Recall@K / MRR to catch retrieval
regressions with numbers, not vibes.
Architecture
Customer channels (Telegram / web / Claude Desktop)
│
[n8n] webhook intake + routing ── low confidence ──► human operator
│
[FastAPI /chat] models warmed at startup
│
[ReAct agent loop] bounded: max_iter / budget / latency
│ parse → retrieve → validate → respond
▼
[catalog tools] (also exposed as a custom MCP server)
search_products → hybrid BM25 + vector + rerank, filter-first
get_live_price / check_stock → live SQL
│
PostgreSQL (price/stock = truth) + Chroma (search cache)
▲
n8n schedule: XML feed → parse → upsert → re-embed
Tech stack
Python · FastAPI · OpenAI (LiteLLM-swappable) · PostgreSQL · Chroma · BM25 · sentence-transformers · cross-encoder reranker · custom MCP server · n8n · aiogram
Quick start
pip install -r requirements.txt
cp .env.example .env # fill OPENAI_API_KEY + PG_*
# create the schema, load the sample feed (builds the hybrid index)
psql -d catalog -f schema.sql
python ingest.py
# ask from the CLI
python agent.py "червоні кросівки до 2000 в наявності"
# or run the HTTP service
uvicorn api:app --port 8000 # → http://localhost:8000/docs
# or the Telegram bot (set TELEGRAM_BOT_TOKEN in .env)
python bot.py
Project layout
| File | What | |---|---| | agent.py | ReAct agent loop, bounded stoppers, structured-output contract | | retrieval.py | hybrid retrieval (BM25 + vector + RRF + cross-encoder) + confidence scores | | catalog_tools.py | read-only catalog tools + structured get_facts for live validation | | server.py | the catalog tools exposed as a custom MCP server | | api.py | FastAPI /chat service (per-session memory, warmup at startup) | | bot.py | standalone Telegram bot (aiogram, long-polling) | | ingest.py | XML feed → PostgreSQL + rebuild the hybrid index | | eval.py | retrieval eval on a golden set (Recall@K / MRR) | | n8n/workflow.json | Telegram → /chat → reply + escalation routing |
Design notes
- Why hybrid, not pure vector — vector search alone can't honor exact filters
(price/stock) or exact tokens (SKUs, model codes); BM25 + structured SQL cover what embeddings miss.
- Why the vector index is never the source of price/stock — it's rebuilt on a
schedule, so its copy of volatile fields is stale by design; the answer always re-validates against SQL.
- Why MCP — the catalog tools are reused across consumers (the agent, an internal
copilot, Claude Desktop): one contract, many clients.
---
The sample catalog and prompts are in Ukrainian; the agent replies in the customer's language.






