v3.0.0April 9, 2026

Kian learns to think

Vallit · 4 min read

A chatbot that answers instantly feels fast. A chatbot that answers well feels like a colleague. This release chooses the second one.

Until now, every user message hit one OpenAI call and came back. That worked — Kian was already solid — but it was a black box. No structured analysis, no quality gate, no way for the AI to catch its own mistakes before shipping them to a user who is deciding whether to book a call. So we rebuilt the core.

Kian is now a context-adaptive multi-agent system. A fast Classifier sees the message first and routes it to the right pipeline depth — a simple dashboard question skips the heavy machinery, a complex consulting query gets the full treatment. Behind the Classifier sit five specialised agents: a Safety checker for prompt injection and PII, a Planner that turns intent into strategy, a Knowledge agent that retrieves and ranks the right sources, the Responder that writes the answer, and a Reviewer that scores it against five weighted criteria before anyone sees it. If the Reviewer catches a weak answer, it regenerates with specific feedback — one retry, then the better of the two ships.

Everything the agents do is visible. Real-time progress stages stream into the widget while Kian is working. After the answer arrives, a Reasoning toggle expands to show the full trail — classification, intent, strategy, sources, quality score, latency, and cost. The same trail is available in the admin dashboard for every conversation, so you can see exactly why Kian said what it said.

What changed

Six-agent pipeline with adaptive routing

Classifier → Safety → Planner → Knowledge → Responder → Reviewer. Simple questions get a fast path, complex ones get the full analysis.

The Classifier decides complexity in 300ms using gpt-4o-mini. Simple greetings skip straight to retrieval + generation (~2s). Moderate questions add a Planner and Reviewer (~3-4s). Complex queries — pricing, frustrated users, multi-topic — run the full pipeline including safety checks (~4-5s). WTM gets SIE-Form and seminar recommendations, Vallit dashboard gets DU-Form and app help, dynamic companies inherit their own persona from the config table.

Quality gate with self-correction

Every moderate and complex response is scored against five criteria before it ships. Failures trigger a retry with specific feedback.

The Reviewer scores relevance (30%), accuracy (25%), tone (15%), actionability (15%), and conciseness (15%). Weighted threshold is 7.0/10. Critical issues — hallucinated seminar names, wrong formality, forbidden contact phrases, PII — auto-fail regardless of score. On retry, the Reviewer's feedback gets injected as a system directive so the second attempt knows exactly what to fix. We keep the higher-scoring version and ship that.

Widget

Visible reasoning in the widget

A pulsing progress stage shows which agent is working. A Reasoning toggle on every answer expands the full trail.

Real progress events stream via Server-Sent Events — no more fake skeleton bars cycling through staged labels. The widget shows the actual current agent in real time: Analysiere, Plane, Durchsuche, Formuliere, Prüfe. After the response arrives, a Reasoning toggle opens a structured trail with icons, latency per stage, knowledge sources with similarity scores, quality score, and total cost. The visual language is deliberate — shimmer on the active stage, greyed out for completed ones, green/red dots for pass/fail.

Platform

Per-message reasoning in the admin dashboard

Every conversation detail page now shows the full agent trail for every assistant message.

Admins see classification, intent, strategy, retrieved sources, model, tokens, quality score, and cost per message. Knowledge retrieval shows method (vector vs keyword vs full-table) and average similarity so knowledge gaps become obvious. The retry flag surfaces when self-correction triggered. This is the same view you'd expect from a professional AI observability tool — but native to your workspace.

Infra

Agent performance telemetry

A new agent_performance table records per-agent latency, tokens, and outputs for every message.

One row per agent per message. Indexed by session, agent name, complexity, and time — ready to power aggregate analytics like latency percentiles, classifier accuracy, retry rates, and cost breakdowns by context. Fire-and-forget writes so telemetry never blocks a user-facing response.

Infra

SSE streaming with JSON fallback

The chat endpoint now streams progress events over Server-Sent Events. Non-SSE clients get a blocking JSON response.

The API route checks the Accept header — text/event-stream triggers the streaming path with live progress, reasoning, and final response events. Anything else gets the classic JSON body with the full reasoning trail embedded. No breaking changes for existing integrations, but new widgets get the real-time experience for free.

← Midnight Cinema

The infrastructure update →