Every layer of the platform — from the moment a WhatsApp message arrives to when an answer is delivered — explained for engineers and operators. Updated to reflect the current production state including the optimization layer, self-healing RAG, semantic result cache, and Vercel-style portal.
Version v3.2Last updated May 2026Host srv985450.hstgr.cloudStack FastAPI · Claude · Qdrant · MS SQL Server
Model selection: regex/keyword classifier on the user message. Cheap (<1ms). Tiered routing keeps costs down — most queries use Haiku and never escalate.
Each query embedded once. Top-1 match in Query Cache tells Claude the business domain upfront.
Table-name guard
RAG-returned table names are validated against the live DB schema. Closest-match resolution corrects typos before they hit Claude.
Module knowledge
10 module docs (production, dispatch, OA, RM, job-work, etc.). Injected only when their tables match — keeps the system prompt slim.
6
Reasoning
Tool loop · how Claude answers
execute_sql_query
Run SELECT on SQL Server (compcode='DAS'). Returns rows + auto-attached sanity, totals, doctor_hints, and (when row_count>30) summarised aggregates.
get_table_doc
Fetch full doc for a table (summary + columns + sample queries). Cached by RAG context — rarely called when context is high-confidence.
get_table_sample
Peek at 5 rows of any table. Almost obsolete — sample rows for top tables are pre-injected into RAG context.
list_tables
Fallback when RAG returned nothing useful. Returns top 60 tables by row count.
get_kpi_definition
Look up a metric from the KPI registry.
Loop budget: max 10 rounds total. Haiku tier escalates to Sonnet at round 6 if still incomplete (without resetting context — Sonnet inherits everything Haiku has learned). Stuck-detector aborts after 3 consecutive SQL errors.
7
Speed & cost
Optimization layer · 5 architectural improvements
Avg rounds
2.2 ↓ 34%
Avg latency
11s ↓ 48%
Avg input tokens
5.1K ↓ 49%
Cache-warm latency
3.5s ↓ 83%
Auto-sanity merge step 1
Sanity check runs server-side inside execute_sql_query. Eliminates the wasted round Claude used to spend calling a separate sanity tool.
Saves~1 round per query
Result summarisation step 2
SQL results > 30 rows are auto-compacted to 5 sample rows + per-column aggregates (sum/avg/min/max/distinct). Claude gets the whole picture without the token bloat.
Token cutup to 90% on data-heavy queries
Pre-computed anchors step 3
8 anchor values (latest invoice/dispatch/production/OA/RM dates, today, active customer count) refreshed hourly and injected into every RAG context. Eliminates "find latest date" exploration round.
Sample rows in RAG step 4
For the top-2 high-confidence RAG tables, one real sample row is pre-injected into the context. Claude knows the exact column format without calling get_table_sample.
Semantic result cache step 5
Successful query→response upserted into Qdrant result_cache_v1. New queries with cosine ≥ 0.92 hit the cache → 0 rounds, ~3s end-to-end. TTL 30 min.
Hit rate~30% on dashboard-style traffic
Query doctor always-on
9 pattern-matching diagnostics. When SQL fails or returns 0 rows from a known anti-pattern, an actionable hint is appended to the tool result.
Patternsdespadvicehdr.custcode NULL · partno wrong table · invtype location · L/Cr units · 6 more
8
Memory & learning
Self-healing RAG · facts and metadata patches
Investigation agent
Background daemon. Triggered when a query fails or hits the round cap. Re-explores the DB with Haiku, finds the root cause.
save_learned_fact
Persists a discovery to learned_facts.json. 27 facts learned to date.
patch_table_metadata
Updates schema_kb.json AND patches Qdrant vectors. The next query gets correct context from round 1.
Conditional injection
Facts only injected when the matching tables appear in RAG. Keeps system prompt slim while honouring all learnings.
The loop: fail → investigate → save fact + patch metadata → next similar query gets the fix automatically. No human in the loop needed for known data quirks.
9
Frontend
Portal & dashboards · Vercel-style design system
/portal
Hub chooser · User vs Admin entry
/portal/user
Business user landing · KPI Dashboard, Schedule Adherence, Anomaly Hub, What-If Studio, Digital Twin (external), WhatsApp
/portal/admin
Engineering landing · Monitoring (5), Configuration (3), Reference (3) — with search and live activity feed
/kpi-dashboard
Live business KPIs · period filters · 4 charts · top customers/products tables · AI briefing
Live status with auto-pulse · content-negotiates JSON for API consumers
/dashboard
Live recent-queries feed · expandable trace viewer · 8s auto-refresh
/architecture
User-facing 10-step flow explanation
Design system: Inter + JetBrains Mono fonts · single blue accent (#0070f3) · Lucide-style stroke SVG icons (no emojis) · 1px-gap grid pattern for hairline dividers · Vercel-quality polish. Full spec at intelify-design-system.md.
10
Visibility
Observability · logs, traces, and metrics
Per-query log
Every query writes to SQLite + Google Sheets: tier, rounds, tokens, latency, cost, RAG tables, escalation, full response.
Round-by-round trace
Every Claude round + tool call + result captured to query_traces. Replayable from any log row in the admin UI.
Anthropic prompt cache
System prompt + tools share one cache key. ~85% read hit rate. Cache write/read tokens logged per query.
Benchmark suite
15-query baseline lives at bench_run.py. Used to validate every architectural change kept accuracy at 100%.
11
Code map
Key files on the server
api.py
Single-file FastAPI app. ~10K lines. Source of truth, lives at /opt/ppc-bot/api.py on the server.