System Overview · Intelify

Intelify · system architecture

Every layer of the platform — from the moment a WhatsApp message arrives to when an answer is delivered — explained for engineers and operators. Updated to reflect the current production state including the optimization layer, self-healing RAG, semantic result cache, and Vercel-style portal.

Version v3.2 Last updated May 2026 Host srv985450.hstgr.cloud Stack FastAPI · Claude · Qdrant · MS SQL Server

User-facing flow→ Admin tools→ Live health→ Performance stats→

Top of the stack

Request flow · WhatsApp → Answer

WhatsApp → Meta Webhook → Traefik :443 → FastAPI :8000 → Tier Router → RAG / Cache → Claude Loop → SQL Server → Response

end-to-end p50 ≈ 11s · cache hit ≈ 3s

Foundation

Infrastructure · servers, services, networking

Hostinger VPS live

Single-node KVM virtual machine. Ubuntu 24.04. All services run here.

IP72.60.98.153

Domainsrv985450.hstgr.cloud

Disk48 GB

Traefik (Docker)

Reverse proxy. Terminates HTTPS via Let's Encrypt, routes by hostname to backends.

Containerroot-traefik-1

Ports80, 443

Routesbot.* → :8000 · n8n.* → :5678

FastAPI (uvicorn)

The Intelify app. Single Python process, async, manages all bot logic and serves all dashboards.

Serviceppc-bot.service (systemd)

Code/opt/ppc-bot/api.py

Workers1 · port 8000

n8n workflows docker

Workflow automation. Holds the WhatsApp webhook receiver and forwards messages to FastAPI.

Containerroot-n8n-1

URLn8n.srv985450.hstgr.cloud

Data layer

Sources of truth · databases & vector stores

MS SQL Server prod

The PPC ERP database. Read-only access from Intelify via ODBC + OpenSSL legacy compat.

Host103.113.96.114:1433

DBppc · compcode='DAS'

Tables682 (incl. 23 trusted sales tables)

Qdrant Cloud

Vector database. Holds 6 collections used for RAG, intent classification, and result caching.

Regionus-east4 (GCP)

EmbeddingsGemini · 3072-dim

Local SQLite

Per-query log + traces. Drives all dashboards, monitoring, and the live activity feed.

Path/opt/ppc-bot/query_log.db

Tablesquery_log, query_traces, kpi_snapshots

sales_table (L1)

23 high-trust sales/dispatch/OA tables. Always searched first.

pdass_3072 (L2)

258 broader tables (production, RM, job-work). Searched only if L1 confidence is low.

sales_query (L1)

46 stored procedures / canonical queries.

sample_SQL (L2)

253 broader procedures.

KPIquery

KPI definitions for the registry.

Query Cache

Intent classification + SQL template cache.

result_cache_v1 (new)

Semantic result cache. Successful queries upserted with TTL=30min. Threshold cosine ≥ 0.92.

AI layer

Models · 4 tiers, picked per query

SIMPLE

Haiku 4.5

Greetings, thanks, "hello". No tools. Direct response.

~1s

HAIKU

Haiku 4.5

Default tier. Has full tool access. Escalates to Sonnet after 6 rounds without an answer (preserves prior context — no reset).

2-15s

SONNET

Sonnet 4.5

Complex queries: ranking, multi-table joins, reasoning over many results.

5-25s

ANALYSIS

Haiku + Sonnet

Root-cause "why" queries. Two-phase: Haiku gathers evidence (12 rounds), Sonnet synthesises.

15-60s

Model selection: regex/keyword classifier on the user message. Cheap (<1ms). Tiered routing keeps costs down — most queries use Haiku and never escalate.

Retrieval

RAG pipeline · finding the right tables

User query → Gemini embed (3072d) → Qdrant L1 (sales_table) → Score < 0.72? → Qdrant L2 (pdass_3072) → Top tables + procs

Intent classifier

Each query embedded once. Top-1 match in Query Cache tells Claude the business domain upfront.

Table-name guard

RAG-returned table names are validated against the live DB schema. Closest-match resolution corrects typos before they hit Claude.

Module knowledge

10 module docs (production, dispatch, OA, RM, job-work, etc.). Injected only when their tables match — keeps the system prompt slim.

Reasoning

Tool loop · how Claude answers

execute_sql_query

Run SELECT on SQL Server (compcode='DAS'). Returns rows + auto-attached sanity, totals, doctor_hints, and (when row_count>30) summarised aggregates.

get_table_doc

Fetch full doc for a table (summary + columns + sample queries). Cached by RAG context — rarely called when context is high-confidence.

get_table_sample

Peek at 5 rows of any table. Almost obsolete — sample rows for top tables are pre-injected into RAG context.

list_tables

Fallback when RAG returned nothing useful. Returns top 60 tables by row count.

get_kpi_definition

Look up a metric from the KPI registry.

Loop budget: max 10 rounds total. Haiku tier escalates to Sonnet at round 6 if still incomplete (without resetting context — Sonnet inherits everything Haiku has learned). Stuck-detector aborts after 3 consecutive SQL errors.

Speed & cost

Optimization layer · 5 architectural improvements

Avg rounds

2.2 ↓ 34%

Avg latency

11s ↓ 48%

Avg input tokens

5.1K ↓ 49%

Cache-warm latency

3.5s ↓ 83%

Auto-sanity merge step 1

Sanity check runs server-side inside execute_sql_query. Eliminates the wasted round Claude used to spend calling a separate sanity tool.

Saves~1 round per query

Result summarisation step 2

SQL results > 30 rows are auto-compacted to 5 sample rows + per-column aggregates (sum/avg/min/max/distinct). Claude gets the whole picture without the token bloat.

Token cutup to 90% on data-heavy queries

Pre-computed anchors step 3

8 anchor values (latest invoice/dispatch/production/OA/RM dates, today, active customer count) refreshed hourly and injected into every RAG context. Eliminates "find latest date" exploration round.

Sample rows in RAG step 4

For the top-2 high-confidence RAG tables, one real sample row is pre-injected into the context. Claude knows the exact column format without calling get_table_sample.

Semantic result cache step 5

Successful query→response upserted into Qdrant result_cache_v1. New queries with cosine ≥ 0.92 hit the cache → 0 rounds, ~3s end-to-end. TTL 30 min.

Hit rate~30% on dashboard-style traffic

Query doctor always-on

9 pattern-matching diagnostics. When SQL fails or returns 0 rows from a known anti-pattern, an actionable hint is appended to the tool result.

Patternsdespadvicehdr.custcode NULL · partno wrong table · invtype location · L/Cr units · 6 more

Memory & learning

Self-healing RAG · facts and metadata patches

Investigation agent

Background daemon. Triggered when a query fails or hits the round cap. Re-explores the DB with Haiku, finds the root cause.

save_learned_fact

Persists a discovery to learned_facts.json. 27 facts learned to date.

patch_table_metadata

Updates schema_kb.json AND patches Qdrant vectors. The next query gets correct context from round 1.

Conditional injection

Facts only injected when the matching tables appear in RAG. Keeps system prompt slim while honouring all learnings.

The loop: fail → investigate → save fact + patch metadata → next similar query gets the fix automatically. No human in the loop needed for known data quirks.

Frontend

Portal & dashboards · Vercel-style design system

/portal

Hub chooser · User vs Admin entry

/portal/user

Business user landing · KPI Dashboard, Schedule Adherence, Anomaly Hub, What-If Studio, Digital Twin (external), WhatsApp

/portal/admin

Engineering landing · Monitoring (5), Configuration (3), Reference (3) — with search and live activity feed

/kpi-dashboard

Live business KPIs · period filters · 4 charts · top customers/products tables · AI briefing

/schedule-adherence

OA fulfillment dashboard · data-quality audit · 4 hero cards · monthly fulfillment chart

/kpi-admin

KPI registry editor · test/save/delete inline · run-all today

/portal/admin/logs

Filterable query feed · tier/escalation/slow filters · click for trace

/portal/admin/facts

Auto-discovered schema knowledge · sorted newest first

/portal/admin/stats

System counters · tier breakdown bars · refreshes every 15s

/portal/admin/analyze

Haiku-generated improvement insights · 5min cached

/health

Live status with auto-pulse · content-negotiates JSON for API consumers

/dashboard

Live recent-queries feed · expandable trace viewer · 8s auto-refresh

/architecture

User-facing 10-step flow explanation

Design system: Inter + JetBrains Mono fonts · single blue accent (#0070f3) · Lucide-style stroke SVG icons (no emojis) · 1px-gap grid pattern for hairline dividers · Vercel-quality polish. Full spec at intelify-design-system.md.

Visibility

Observability · logs, traces, and metrics

Per-query log

Every query writes to SQLite + Google Sheets: tier, rounds, tokens, latency, cost, RAG tables, escalation, full response.

Round-by-round trace

Every Claude round + tool call + result captured to query_traces. Replayable from any log row in the admin UI.

Anthropic prompt cache

System prompt + tools share one cache key. ~85% read hit rate. Cache write/read tokens logged per query.

Benchmark suite

15-query baseline lives at bench_run.py. Used to validate every architectural change kept accuracy at 100%.

Code map

Key files on the server

api.py

Single-file FastAPI app. ~10K lines. Source of truth, lives at /opt/ppc-bot/api.py on the server.

schema_kb.json

RAG knowledge base. Per-table summaries, columns, sample queries, stale/useless lists.

learned_facts.json

Auto-discovered domain facts (currently 27).

kpi_registry.json

KPI metric definitions (SQL + label + format). Editable via /kpi-admin.

query_log.db

SQLite — query log + traces + KPI snapshots.

openssl_legacy.cnf

OpenSSL 3.x compatibility shim required for the SQL Server SSL handshake.

intelify-design-system.md

Portable design spec. Copy-paste into any Claude session to keep dashboards consistent.