Intelify · system architecture

Every layer of the platform — from the moment a WhatsApp message arrives to when an answer is delivered — explained for engineers and operators. Updated to reflect the current production state including the optimization layer, self-healing RAG, semantic result cache, and Vercel-style portal.

Version v3.2 Last updated May 2026 Host srv985450.hstgr.cloud Stack FastAPI · Claude · Qdrant · MS SQL Server
1
Top of the stack
Request flow · WhatsApp → Answer
WhatsApp Meta Webhook Traefik :443 FastAPI :8000 Tier Router RAG / Cache Claude Loop SQL Server Response
end-to-end p50 ≈ 11s · cache hit ≈ 3s
2
Foundation
Infrastructure · servers, services, networking

Hostinger VPS live

Single-node KVM virtual machine. Ubuntu 24.04. All services run here.

IP72.60.98.153
Domainsrv985450.hstgr.cloud
Disk48 GB

Traefik (Docker)

Reverse proxy. Terminates HTTPS via Let's Encrypt, routes by hostname to backends.

Containerroot-traefik-1
Ports80, 443
Routesbot.* → :8000 · n8n.* → :5678

FastAPI (uvicorn)

The Intelify app. Single Python process, async, manages all bot logic and serves all dashboards.

Serviceppc-bot.service (systemd)
Code/opt/ppc-bot/api.py
Workers1 · port 8000

n8n workflows docker

Workflow automation. Holds the WhatsApp webhook receiver and forwards messages to FastAPI.

Containerroot-n8n-1
URLn8n.srv985450.hstgr.cloud
3
Data layer
Sources of truth · databases & vector stores

MS SQL Server prod

The PPC ERP database. Read-only access from Intelify via ODBC + OpenSSL legacy compat.

Host103.113.96.114:1433
DBppc · compcode='DAS'
Tables682 (incl. 23 trusted sales tables)

Qdrant Cloud

Vector database. Holds 6 collections used for RAG, intent classification, and result caching.

Regionus-east4 (GCP)
EmbeddingsGemini · 3072-dim

Local SQLite

Per-query log + traces. Drives all dashboards, monitoring, and the live activity feed.

Path/opt/ppc-bot/query_log.db
Tablesquery_log, query_traces, kpi_snapshots
sales_table (L1)
23 high-trust sales/dispatch/OA tables. Always searched first.
pdass_3072 (L2)
258 broader tables (production, RM, job-work). Searched only if L1 confidence is low.
sales_query (L1)
46 stored procedures / canonical queries.
sample_SQL (L2)
253 broader procedures.
KPIquery
KPI definitions for the registry.
Query Cache
Intent classification + SQL template cache.
result_cache_v1 (new)
Semantic result cache. Successful queries upserted with TTL=30min. Threshold cosine ≥ 0.92.
4
AI layer
Models · 4 tiers, picked per query
SIMPLE
Haiku 4.5
Greetings, thanks, "hello". No tools. Direct response.
~1s
HAIKU
Haiku 4.5
Default tier. Has full tool access. Escalates to Sonnet after 6 rounds without an answer (preserves prior context — no reset).
2-15s
SONNET
Sonnet 4.5
Complex queries: ranking, multi-table joins, reasoning over many results.
5-25s
ANALYSIS
Haiku + Sonnet
Root-cause "why" queries. Two-phase: Haiku gathers evidence (12 rounds), Sonnet synthesises.
15-60s
Model selection: regex/keyword classifier on the user message. Cheap (<1ms). Tiered routing keeps costs down — most queries use Haiku and never escalate.
5
Retrieval
RAG pipeline · finding the right tables
User query Gemini embed (3072d) Qdrant L1 (sales_table) Score < 0.72? Qdrant L2 (pdass_3072) Top tables + procs

Intent classifier

Each query embedded once. Top-1 match in Query Cache tells Claude the business domain upfront.

Table-name guard

RAG-returned table names are validated against the live DB schema. Closest-match resolution corrects typos before they hit Claude.

Module knowledge

10 module docs (production, dispatch, OA, RM, job-work, etc.). Injected only when their tables match — keeps the system prompt slim.

6
Reasoning
Tool loop · how Claude answers
execute_sql_query
Run SELECT on SQL Server (compcode='DAS'). Returns rows + auto-attached sanity, totals, doctor_hints, and (when row_count>30) summarised aggregates.
get_table_doc
Fetch full doc for a table (summary + columns + sample queries). Cached by RAG context — rarely called when context is high-confidence.
get_table_sample
Peek at 5 rows of any table. Almost obsolete — sample rows for top tables are pre-injected into RAG context.
list_tables
Fallback when RAG returned nothing useful. Returns top 60 tables by row count.
get_kpi_definition
Look up a metric from the KPI registry.
Loop budget: max 10 rounds total. Haiku tier escalates to Sonnet at round 6 if still incomplete (without resetting context — Sonnet inherits everything Haiku has learned). Stuck-detector aborts after 3 consecutive SQL errors.
7
Speed & cost
Optimization layer · 5 architectural improvements
Avg rounds
2.2 ↓ 34%
Avg latency
11s ↓ 48%
Avg input tokens
5.1K ↓ 49%
Cache-warm latency
3.5s ↓ 83%

Auto-sanity merge step 1

Sanity check runs server-side inside execute_sql_query. Eliminates the wasted round Claude used to spend calling a separate sanity tool.

Saves~1 round per query

Result summarisation step 2

SQL results > 30 rows are auto-compacted to 5 sample rows + per-column aggregates (sum/avg/min/max/distinct). Claude gets the whole picture without the token bloat.

Token cutup to 90% on data-heavy queries

Pre-computed anchors step 3

8 anchor values (latest invoice/dispatch/production/OA/RM dates, today, active customer count) refreshed hourly and injected into every RAG context. Eliminates "find latest date" exploration round.

Sample rows in RAG step 4

For the top-2 high-confidence RAG tables, one real sample row is pre-injected into the context. Claude knows the exact column format without calling get_table_sample.

Semantic result cache step 5

Successful query→response upserted into Qdrant result_cache_v1. New queries with cosine ≥ 0.92 hit the cache → 0 rounds, ~3s end-to-end. TTL 30 min.

Hit rate~30% on dashboard-style traffic

Query doctor always-on

9 pattern-matching diagnostics. When SQL fails or returns 0 rows from a known anti-pattern, an actionable hint is appended to the tool result.

Patternsdespadvicehdr.custcode NULL · partno wrong table · invtype location · L/Cr units · 6 more
8
Memory & learning
Self-healing RAG · facts and metadata patches

Investigation agent

Background daemon. Triggered when a query fails or hits the round cap. Re-explores the DB with Haiku, finds the root cause.

save_learned_fact

Persists a discovery to learned_facts.json. 27 facts learned to date.

patch_table_metadata

Updates schema_kb.json AND patches Qdrant vectors. The next query gets correct context from round 1.

Conditional injection

Facts only injected when the matching tables appear in RAG. Keeps system prompt slim while honouring all learnings.

The loop: fail → investigate → save fact + patch metadata → next similar query gets the fix automatically. No human in the loop needed for known data quirks.
9
Frontend
Portal & dashboards · Vercel-style design system
/portal
Hub chooser · User vs Admin entry
/portal/user
Business user landing · KPI Dashboard, Schedule Adherence, Anomaly Hub, What-If Studio, Digital Twin (external), WhatsApp
/portal/admin
Engineering landing · Monitoring (5), Configuration (3), Reference (3) — with search and live activity feed
/kpi-dashboard
Live business KPIs · period filters · 4 charts · top customers/products tables · AI briefing
/schedule-adherence
OA fulfillment dashboard · data-quality audit · 4 hero cards · monthly fulfillment chart
/kpi-admin
KPI registry editor · test/save/delete inline · run-all today
/portal/admin/logs
Filterable query feed · tier/escalation/slow filters · click for trace
/portal/admin/facts
Auto-discovered schema knowledge · sorted newest first
/portal/admin/stats
System counters · tier breakdown bars · refreshes every 15s
/portal/admin/analyze
Haiku-generated improvement insights · 5min cached
/health
Live status with auto-pulse · content-negotiates JSON for API consumers
/dashboard
Live recent-queries feed · expandable trace viewer · 8s auto-refresh
/architecture
User-facing 10-step flow explanation
Design system: Inter + JetBrains Mono fonts · single blue accent (#0070f3) · Lucide-style stroke SVG icons (no emojis) · 1px-gap grid pattern for hairline dividers · Vercel-quality polish. Full spec at intelify-design-system.md.
10
Visibility
Observability · logs, traces, and metrics

Per-query log

Every query writes to SQLite + Google Sheets: tier, rounds, tokens, latency, cost, RAG tables, escalation, full response.

Round-by-round trace

Every Claude round + tool call + result captured to query_traces. Replayable from any log row in the admin UI.

Anthropic prompt cache

System prompt + tools share one cache key. ~85% read hit rate. Cache write/read tokens logged per query.

Benchmark suite

15-query baseline lives at bench_run.py. Used to validate every architectural change kept accuracy at 100%.

11
Code map
Key files on the server
api.py
Single-file FastAPI app. ~10K lines. Source of truth, lives at /opt/ppc-bot/api.py on the server.
schema_kb.json
RAG knowledge base. Per-table summaries, columns, sample queries, stale/useless lists.
learned_facts.json
Auto-discovered domain facts (currently 27).
kpi_registry.json
KPI metric definitions (SQL + label + format). Editable via /kpi-admin.
query_log.db
SQLite — query log + traces + KPI snapshots.
openssl_legacy.cnf
OpenSSL 3.x compatibility shim required for the SQL Server SSL handshake.
intelify-design-system.md
Portable design spec. Copy-paste into any Claude session to keep dashboards consistent.