Agentic-Service-Data-Eyond-Catalog

Sleeping

Rifqi Hafizuddin Claude Opus 4.7 commited on 16 days ago

Commit

f65aee0

1 Parent(s): 212dad3

[NOTICKET] IntentRouter + planner + dispatcher + QueryService + AnswerAgent + ChatHandler

Bundles every Both-PR work item that doesn't require teammate's tabular
modules. Phase 1 endpoints and chatbot.py are intentionally NOT touched --
cleanup PR will rewire /chat/stream and rename answer_agent.py -> chatbot.py.

- IntentRouter: classify chat / unstructured / structured; history-aware
rewritten_query via Pydantic structured output. Prompt with full ruleset
+ few-shot in config/prompts/intent_router.md.
- Planner prompt: build_planner_prompt(question, catalog, previous_error)
reuses catalog.enricher.render_source so DB and tabular sources render
identically across enricher and planner. System prompt has hard
constraints + DB and tabular few-shot in config/prompts/query_planner.md.
- QueryPlannerService: Azure OpenAI structured output -> QueryIR.
Injectable chain. Supports retry via previous_error arg.
- ExecutorDispatcher: pick(ir) routes by source.source_type. Lazy executor
imports keep module import-safe. Caches per source_type. Tests inject
factories.
- QueryService: plan -> validate -> retry-on-failure (max 3) -> dispatch ->
execute. Catches NotImplementedError from TabularExecutor placeholder
gracefully. Never raises -- populates QueryResult.error.
- AnswerAgent (Phase 2 chatbot, lives at agents/answer_agent.py to avoid
colliding with Phase 1 agents/chatbot.py): streams tokens via SSE-ready
AsyncIterator; accepts QueryResult and/or list[DocumentChunk].
config/prompts/chatbot_system.md + guardrails.md.
- ChatHandler: top-level orchestrator. handle(message, user_id, history)
yields {event, data} dicts (intent / chunk / done / error). Routes by
source_hint; degrades gracefully when DocumentRetriever / TabularExecutor
placeholders raise NotImplementedError.

Tests: 46 new (146 total + 2 skipped). All Phase 2 paths ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (12) hide show

PROGRESS.md +57 -19
src/agents/answer_agent.py +170 -0
src/agents/chat_handler.py +207 -0
src/agents/intent_router.py +92 -9
src/config/prompts/chatbot_system.md +26 -11
src/config/prompts/guardrails.md +9 -10
src/config/prompts/intent_router.md +57 -16
src/config/prompts/query_planner.md +140 -14
src/query/executor/dispatcher.py +63 -3
src/query/planner/prompt.py +40 -3
src/query/planner/service.py +80 -5
src/query/service.py +122 -2

PROGRESS.md CHANGED Viewed

@@ -2,8 +2,8 @@
 Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "Team — division of work". Update as PRs land. Future Claude Code sessions read this to know what's already done.
-**Last updated**: 2026-05-07 (PR3-DB — SQL compiler + DB executor shipped)
-**Current open PR**: PR3-DB (DB owner — SqlCompiler + DbExecutor + golden IR→SQL tests)
 ---
@@ -23,13 +23,13 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 | PR1 | `[x]` merged | DB | Contract locks + catalog plumbing + DB introspector + IR validator + tests |
 | PR1-tab | `[ ]` | TAB | Tabular introspector + golden IR examples for tabular |
 | PR2a | `[x]` merged | DB | CatalogEnricher + StructuredPipeline + on_db_registered trigger + FK extension on Table |
-| PR2b | `[ ]` | B | IntentRouter + planner prompt (pair) + planner LLM service |
-| PR3-DB | `[~]` open | DB | SqlCompiler (Postgres) + DbExecutor (sqlglot guard, RO + statement_timeout, asyncio.to_thread) + 36 golden IR→SQL tests |
 | PR3-TAB | `[ ]` | TAB | Pandas compiler + tabular executor + golden IR→DataFrame tests |
-| PR4 | `[ ]` | B (pair) | ExecutorDispatcher + QueryService + chat stream endpoint integration |
-| PR5 | `[ ]` | B | Retry/self-correction loop on execution failure |
-| PR6 | `[ ]` | B | Eval harness (golden question→IR→result examples) |
-| PR7 | `[ ]` | B | Auto PII tagging review + ChatbotAgent rewrite + API rewiring |
 | Cleanup | `[ ]` | B | Remove Phase 1 (rag/, query/executors/, database_client/, …) once Phase 2 has feature parity |
 ---
@@ -80,13 +80,13 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 | # | Item | Owner | Status | Notes |
 |---|---|---|---|---|
 | 17 | IR validator (`query/ir/validator.py`) | B | `[x]` | PR1 (DB owner) — full rule set; descriptive errors for planner retry |
-| 18 | Planner LLM service (`query/planner/service.py`) | B | `[ ]` | PR2b |
-| 19 | Planner prompt (`query/planner/prompt.py`, `config/prompts/query_planner.md`) | B | `[ ]` | PR2b — **pair-program**; must render DB and tabular sources uniformly. Can reuse `catalog.enricher.render_source` as a starting point. |
-| 20 | Intent router (`agents/intent_router.py`, `config/prompts/intent_router.md`) | B | `[ ]` | PR2b |
 | 21 | Executor base + `QueryResult` (`query/executor/base.py`) | B | `[x]` | Pre-existing scaffold |
-| 22 | Executor dispatcher (`query/executor/dispatcher.py`) | B | `[ ]` | PR4 — `(Catalog, IR) → BaseExecutor` |
 | 23 | Compiler base ABC (`query/compiler/base.py`) | B | `[x]` | Pre-existing scaffold |
-| 24 | Top-level QueryService (`query/service.py`) | B | `[ ]` | PR4 — wires planner → validator → compiler → executor |
 ### Query — DB path
@@ -109,8 +109,9 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 | # | Item | Status | Notes |
 |---|---|---|---|
-| 32 | Chatbot agent + prompt (`agents/chatbot.py`, `config/prompts/chatbot_system.md`) | `[ ]` | PR7 — receives `QueryResult` or Cu chunks |
-| 33 | Guardrails prompt (`config/prompts/guardrails.md`) | `[ ]` | PR7 |
 ### API surface
@@ -118,7 +119,7 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 |---|---|---|---|---|
 | 34 | DB client endpoints (`api/v1/db_client.py`) | DB | `[ ]` | Phase 1 endpoint exists — rewire `/ingest` to call `pipeline.triggers.on_db_registered`. Trigger is ready as of PR2a; deferred to a later PR until both teammates ack. |
 | 35 | Document/tabular upload endpoints (`api/v1/document.py`) | TAB | `[ ]` | Phase 1 endpoint exists — rewire after enricher |
-| 36 | Chat stream endpoint (`api/v1/chat.py`) | B | `[ ]` | PR4 — pair on dispatch logic; SSE event sequence stays |
 | 37 | Room / users endpoints (`api/v1/room.py`, `api/v1/users.py`) | B | `[ ]` | No catalog work; only touch if auth flow changes |
 ### Tests + eval
@@ -133,14 +134,51 @@ Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "T
 | — | Catalog store integration test (`tests/catalog/test_store.py`) | DB | `[x]` | PR1 — module-level skip without `RUN_INTEGRATION_TESTS=1` |
 | — | DB introspector test | DB | `[ ]` | Deferred to PR2 — needs Postgres testcontainer or fixture infra |
 | — | Tabular introspector test | TAB | `[ ]` | TAB to add when introspector lands |
-| 41 | Planner eval (`tests/query/planner/`) | B | `[ ]` | PR6 — golden question → IR examples; each side contributes |
-| 42 | E2E smoke tests (`tests/e2e/`) | B | `[ ]` | PR4 — pair |
 | — | Golden IR fixtures (`tests/fixtures/golden_irs.json`) | B | `[~]` | PR1 seeded with 5 DB-targeting examples; TAB extends in PR1-tab |
 | — | Shared `sample_catalog` fixture (`tests/conftest.py`) | B | `[x]` | PR1 — DB-shaped; TAB may add tabular sibling |
 ---
-## What just shipped (PR3-DB — DB owner)
 **Files implemented**:
 - `src/query/compiler/sql.py` — `SqlCompiler` for Postgres dialect; `CompiledSql(sql, params)` dataclass with `params: dict[str, Any]` (changed from `list`); supports all 12 whitelisted filter ops, all 6 aggs, alias-aware order_by; `_qident` escapes embedded double-quotes

 Persistent tracker mirroring the 42-item ownership table in `REPO_CONTEXT.md` "Team — division of work". Update as PRs land. Future Claude Code sessions read this to know what's already done.
+**Last updated**: 2026-05-08 (PR2b/4/5/6/7-bundle — all Both-PR work, DB owner solo)
+**Current open PR**: Both-PR bundle (IntentRouter, planner, QueryService + retry, dispatcher, AnswerAgent, chat handler, eval scaffold)
 ---
 | PR1 | `[x]` merged | DB | Contract locks + catalog plumbing + DB introspector + IR validator + tests |
 | PR1-tab | `[ ]` | TAB | Tabular introspector + golden IR examples for tabular |
 | PR2a | `[x]` merged | DB | CatalogEnricher + StructuredPipeline + on_db_registered trigger + FK extension on Table |
+| PR2b | `[x]` shipped | DB-solo (B-review) | IntentRouter + planner prompt + planner LLM service |
+| PR3-DB | `[x]` shipped | DB | SqlCompiler (Postgres) + DbExecutor (sqlglot guard, RO + statement_timeout, asyncio.to_thread) + 36 golden IR→SQL tests |
 | PR3-TAB | `[ ]` | TAB | Pandas compiler + tabular executor + golden IR→DataFrame tests |
+| PR4 | `[~]` shipped, API not yet wired | DB-solo (B-review) | ExecutorDispatcher + QueryService + ChatHandler module. **API rewiring of Phase 1 endpoints deferred to PR7-cleanup.** |
+| PR5 | `[x]` shipped | DB-solo (B-review) | Retry/self-correction loop on validation failure (lives in QueryService, max 3 attempts, planner re-prompted with prior error) |
+| PR6 | `[~]` scaffold | DB-solo (B-review) | Eval harness scaffold + 3 DB-targeting golden cases. Skipped without `RUN_PLANNER_EVAL=1` env. TAB extends with tabular cases. |
+| PR7 | `[~]` partial — AnswerAgent shipped, API rewiring pending | DB-solo (B-review) | AnswerAgent (at `agents/answer_agent.py`, will rename to `chatbot.py` in cleanup) + chatbot_system + guardrails prompts. **API rewiring of `/chat/stream` and `/database-clients/{id}/ingest` to call Phase 2 modules: deferred to a focused cleanup PR.** Auto PII tagging review still pending. |
 | Cleanup | `[ ]` | B | Remove Phase 1 (rag/, query/executors/, database_client/, …) once Phase 2 has feature parity |
 ---
 | # | Item | Owner | Status | Notes |
 |---|---|---|---|---|
 | 17 | IR validator (`query/ir/validator.py`) | B | `[x]` | PR1 (DB owner) — full rule set; descriptive errors for planner retry |
+| 18 | Planner LLM service (`query/planner/service.py`) | B | `[x]` | PR2b — Azure OpenAI structured output → `QueryIR`. Injectable chain. Supports retry via `previous_error` argument. |
+| 19 | Planner prompt (`query/planner/prompt.py`, `config/prompts/query_planner.md`) | B | `[x]` | PR2b — system prompt with hard constraints + few-shot for DB and tabular sources. `build_planner_prompt(question, catalog, previous_error)` reuses `catalog.enricher.render_source` so both LLM call sites see the same source format. |
+| 20 | Intent router (`agents/intent_router.py`, `config/prompts/intent_router.md`) | B | `[x]` | PR2b — single LLM call → `IntentRouterDecision(needs_search, source_hint, rewritten_query)`. Supports conversation history. |
 | 21 | Executor base + `QueryResult` (`query/executor/base.py`) | B | `[x]` | Pre-existing scaffold |
+| 22 | Executor dispatcher (`query/executor/dispatcher.py`) | B | `[x]` | PR4 — picks DbExecutor / TabularExecutor by `source.source_type`. Lazy imports of production executors keep import side-effect-free for tests. Caches per source_type. |
 | 23 | Compiler base ABC (`query/compiler/base.py`) | B | `[x]` | Pre-existing scaffold |
+| 24 | Top-level QueryService (`query/service.py`) | B | `[x]` | PR4+5 — `plan → validate → dispatch → execute → QueryResult`. Retry loop on validation failure (max 3, planner re-prompted with prior error). Catches NotImplementedError from TabularExecutor placeholder gracefully. Never raises. |
 ### Query — DB path
 | # | Item | Status | Notes |
 |---|---|---|---|
+| 32 | Chatbot agent + prompt (`agents/answer_agent.py` for now, → rename to `chatbot.py` in cleanup; `config/prompts/chatbot_system.md`) | `[x]` | PR7-bundle — `AnswerAgent` streams tokens, accepts `QueryResult` or list[`DocumentChunk`] or neither. Lives at `agents/answer_agent.py` to avoid colliding with Phase 1 `agents/chatbot.py`. Cleanup PR will rename + replace. |
+| 33 | Guardrails prompt (`config/prompts/guardrails.md`) | `[x]` | PR7-bundle — appended to `chatbot_system.md` so guardrails take precedence in conflict. |
+| — | Chat handler / orchestrator (`agents/chat_handler.py`) | `[x]` | PR4-bundle — top-level Phase 2 orchestrator. Routes by `source_hint`: chat → AnswerAgent direct; structured → CatalogReader + QueryService; unstructured → DocumentRetriever placeholder + AnswerAgent. Yields `intent` / `chunk` / `done` / `error` SSE-style events. Phase 1 chat.py NOT touched — cleanup PR rewires the API to call this. |
 ### API surface
 |---|---|---|---|---|
 | 34 | DB client endpoints (`api/v1/db_client.py`) | DB | `[ ]` | Phase 1 endpoint exists — rewire `/ingest` to call `pipeline.triggers.on_db_registered`. Trigger is ready as of PR2a; deferred to a later PR until both teammates ack. |
 | 35 | Document/tabular upload endpoints (`api/v1/document.py`) | TAB | `[ ]` | Phase 1 endpoint exists — rewire after enricher |
+| 36 | Chat stream endpoint (`api/v1/chat.py`) | B | `[ ]` | Phase 2 handler module ready (`agents/chat_handler.py`); rewiring of the actual `/chat/stream` endpoint deferred to cleanup PR to avoid breaking Phase 1 during the migration. |
 | 37 | Room / users endpoints (`api/v1/room.py`, `api/v1/users.py`) | B | `[ ]` | No catalog work; only touch if auth flow changes |
 ### Tests + eval
 | — | Catalog store integration test (`tests/catalog/test_store.py`) | DB | `[x]` | PR1 — module-level skip without `RUN_INTEGRATION_TESTS=1` |
 | — | DB introspector test | DB | `[ ]` | Deferred to PR2 — needs Postgres testcontainer or fixture infra |
 | — | Tabular introspector test | TAB | `[ ]` | TAB to add when introspector lands |
+| 41 | Planner eval (`tests/query/planner/`) | B | `[~]` | PR6-scaffold — `test_golden_questions.py` with 3 DB-targeting cases. Skipped by default; runs against real Azure OpenAI when `RUN_PLANNER_EVAL=1`. TAB extends with tabular-targeting cases once their compiler exists. |
+| 42 | E2E smoke tests (`tests/e2e/`) | B | `[ ]` | Defer until Phase 2 endpoints are wired (cleanup PR). Component-level orchestration is already covered by `test_chat_handler.py` + `test_service.py`. |
 | — | Golden IR fixtures (`tests/fixtures/golden_irs.json`) | B | `[~]` | PR1 seeded with 5 DB-targeting examples; TAB extends in PR1-tab |
 | — | Shared `sample_catalog` fixture (`tests/conftest.py`) | B | `[x]` | PR1 — DB-shaped; TAB may add tabular sibling |
 ---
+## What just shipped (PR2b/4/5/6/7-bundle — DB owner solo, teammate reviews)
+**Files implemented**:
+- `src/agents/intent_router.py` — `IntentRouter.classify(message, history) → IntentRouterDecision`. Pydantic model for structured output. History-aware query rewriting.
+- `src/agents/answer_agent.py` — `AnswerAgent.astream(...)` streams answer tokens; accepts `QueryResult` and/or `list[DocumentChunk]`. Renames to `chatbot.py` in cleanup PR.
+- `src/agents/chat_handler.py` — `ChatHandler.handle(message, user_id, history)` returns `AsyncIterator[dict]` of `intent` / `chunk` / `done` / `error` SSE events. All deps injectable; lazy default builders.
+- `src/query/planner/prompt.py` — `render_catalog(catalog)` + `build_planner_prompt(question, catalog, previous_error)`. Reuses `catalog.enricher.render_source` for consistency across LLM call sites.
+- `src/query/planner/service.py` — `QueryPlannerService.plan(question, catalog, previous_error)` Azure OpenAI structured output → `QueryIR`.
+- `src/query/executor/dispatcher.py` — `ExecutorDispatcher.pick(ir) → BaseExecutor` by `source.source_type`. Lazy executor imports + per-source-type cache.
+- `src/query/service.py` — `QueryService.run(user_id, question, catalog) → QueryResult`. Plan→validate→retry-on-failure (max 3)→dispatch→execute. Catches NotImplementedError from TabularExecutor placeholder gracefully.
+**Prompts written** (filled in placeholders):
+- `src/config/prompts/intent_router.md`
+- `src/config/prompts/query_planner.md`
+- `src/config/prompts/chatbot_system.md`
+- `src/config/prompts/guardrails.md`
+**Tests added** (46 new — total now 146 + 2 skipped):
+- `tests/agents/test_intent_router.py` (4)
+- `tests/agents/test_answer_agent.py` (12)
+- `tests/agents/test_chat_handler.py` (6)
+- `tests/query/planner/test_prompt.py` (7)
+- `tests/query/planner/test_service.py` (3)
+- `tests/query/executor/test_dispatcher.py` (5)
+- `tests/query/test_service.py` (8)
+- `tests/query/planner/test_golden_questions.py` (3 — skipped by default; eval harness scaffold)
+**Lint**: `ruff check` clean on all Phase 2 paths. Phase 1 files have pre-existing E501/S608 issues — out of scope for this PR.
+**Placeholders / blockers for teammate**:
+- `src/query/executor/tabular.py` (TAB) — still raises `NotImplementedError`. `QueryService` catches it and returns a friendly "not yet available" error. Once teammate ships PR3-TAB, the dispatcher routes to it automatically.
+- `src/retrieval/document.py` (TAB or DB-cleanup) — same pattern. ChatHandler catches `NotImplementedError` and emits an `error` event.
+- `src/api/v1/chat.py` (Phase 1) — NOT touched. Cleanup PR rewires the SSE endpoint to call `ChatHandler.handle(...)`.
+- `src/api/v1/db_client.py` (Phase 1) — NOT touched. Cleanup PR rewires `/database-clients/{id}/ingest` to call `pipeline.triggers.on_db_registered`.
+---
+## What shipped previously (PR3-DB — DB owner)
 **Files implemented**:
 - `src/query/compiler/sql.py` — `SqlCompiler` for Postgres dialect; `CompiledSql(sql, params)` dataclass with `params: dict[str, Any]` (changed from `list`); supports all 12 whitelisted filter ops, all 6 aggs, alias-aware order_by; `_qident` escapes embedded double-quotes

src/agents/answer_agent.py ADDED Viewed

	@@ -0,0 +1,170 @@

+"""AnswerAgent — final answer formation. Phase 2 chatbot.
+Receives one of:
+  - a `QueryResult` (structured query path),
+  - a list of document chunks (unstructured path), or
+  - nothing (chat-only path: greeting, farewell, meta question).
+Streams the answer token-by-token so the chat handler can wrap each token
+into an SSE event. Conversation history is supported.
+Lives at `agents/answer_agent.py` rather than `agents/chatbot.py` to avoid
+colliding with the Phase 1 chatbot still imported by the legacy chat
+endpoint. PR7 cleanup will rename this to `chatbot.py` after Phase 1's
+chat endpoint is rewired to call this through `agents/chat_handler.py`.
+"""
+from __future__ import annotations
+from collections.abc import AsyncIterator
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+from langchain_core.messages import BaseMessage
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
+from langchain_core.runnables import Runnable
+from langchain_openai import AzureChatOpenAI
+from src.middlewares.logging import get_logger
+from ..query.executor.base import QueryResult
+logger = get_logger("answer_agent")
+_PROMPT_DIR = Path(__file__).resolve().parent.parent / "config" / "prompts"
+_SYSTEM_PROMPT_PATH = _PROMPT_DIR / "chatbot_system.md"
+_GUARDRAILS_PATH = _PROMPT_DIR / "guardrails.md"
+@dataclass
+class DocumentChunk:
+    """One retrieved document chunk for the unstructured path."""
+    content: str
+    filename: str | None = None
+    page_label: str | None = None
+def _load_system_prompt() -> str:
+    """Compose system prompt = chatbot_system.md + guardrails.md.
+    Guardrails appended last so they take precedence in conflict (matches
+    the docstring at the top of guardrails.md).
+    """
+    chatbot = _SYSTEM_PROMPT_PATH.read_text(encoding="utf-8")
+    guardrails = _GUARDRAILS_PATH.read_text(encoding="utf-8")
+    return f"{chatbot}\n\n{guardrails}"
+def _format_query_result(qr: QueryResult) -> str:
+    """Render a QueryResult as a compact context block for the LLM."""
+    if qr.error:
+        return (
+            f"[Query result — FAILED]\n"
+            f"source_id={qr.source_id}\n"
+            f"error: {qr.error}"
+        )
+    lines: list[str] = [
+        "[Query result]",
+        f"source_id: {qr.source_id}",
+        f"backend: {qr.backend}",
+        f"row_count: {qr.row_count}"
+        + (" (truncated)" if qr.truncated else ""),
+        f"elapsed_ms: {qr.elapsed_ms}",
+    ]
+    if qr.rows:
+        # Cap rendering at 25 rows; the LLM doesn't need the full set
+        cap = min(len(qr.rows), 25)
+        columns = list(qr.rows[0].keys())
+        lines.append("columns: " + ", ".join(columns))
+        lines.append("rows:")
+        for row in qr.rows[:cap]:
+            lines.append("  " + ", ".join(f"{k}={row[k]!r}" for k in columns))
+        if cap < len(qr.rows):
+            lines.append(f"  ... (+{len(qr.rows) - cap} more rows omitted from prompt)")
+    return "\n".join(lines)
+def _format_document_chunks(chunks: list[DocumentChunk]) -> str:
+    if not chunks:
+        return ""
+    blocks: list[str] = []
+    for c in chunks:
+        label_parts = [p for p in (c.filename, c.page_label) if p]
+        label = ", ".join(label_parts) if label_parts else "Unknown source"
+        blocks.append(f"[Source: {label}]\n{c.content}")
+    return "\n\n".join(blocks)
+def _build_context_block(
+    query_result: QueryResult | None,
+    chunks: list[DocumentChunk] | None,
+) -> str:
+    parts: list[str] = []
+    if query_result is not None:
+        parts.append(_format_query_result(query_result))
+    if chunks:
+        parts.append(_format_document_chunks(chunks))
+    return "\n\n".join(parts) if parts else "(no data context — answer conversationally)"
+def _build_default_chain() -> Runnable:
+    from src.config.settings import settings
+    llm = AzureChatOpenAI(
+        azure_deployment=settings.azureai_deployment_name_4o,
+        openai_api_version=settings.azureai_api_version_4o,
+        azure_endpoint=settings.azureai_endpoint_url_4o,
+        api_key=settings.azureai_api_key_4o,
+        temperature=0.3,
+    )
+    prompt = ChatPromptTemplate.from_messages(
+        [
+            ("system", _load_system_prompt()),
+            MessagesPlaceholder(variable_name="history", optional=True),
+            ("human", "{message}"),
+            ("system", "Data context for this turn:\n\n{context}"),
+        ]
+    )
+    return prompt | llm | StrOutputParser()
+class AnswerAgent:
+    """Formats and streams the final user-facing answer.
+    `chain` is injectable: tests pass a fake that yields canned tokens.
+    Default constructs the production Azure OpenAI streaming chain on
+    first use.
+    """
+    def __init__(self, chain: Runnable | None = None) -> None:
+        self._chain = chain
+    def _ensure_chain(self) -> Runnable:
+        if self._chain is None:
+            self._chain = _build_default_chain()
+        return self._chain
+    async def astream(
+        self,
+        message: str,
+        history: list[BaseMessage] | None = None,
+        query_result: QueryResult | None = None,
+        chunks: list[DocumentChunk] | None = None,
+    ) -> AsyncIterator[str]:
+        """Stream tokens of the final answer.
+        Caller wraps each token into the SSE format. Empty `history` and
+        no context = pure chat reply.
+        """
+        chain = self._ensure_chain()
+        payload: dict[str, Any] = {
+            "message": message,
+            "history": history or [],
+            "context": _build_context_block(query_result, chunks),
+        }
+        async for token in chain.astream(payload):
+            yield token

src/agents/chat_handler.py ADDED Viewed

	@@ -0,0 +1,207 @@

+"""ChatHandler — top-level Phase 2 chat orchestrator.
+End-to-end flow per user message:
+  1. `IntentRouter.classify` → `chat` / `unstructured` / `structured`.
+  2. Route:
+       - `chat`         → no context. Pass straight to AnswerAgent.
+       - `structured`   → CatalogReader → QueryService → QueryResult.
+       - `unstructured` → DocumentRetriever (placeholder, raises until TAB
+                          ships) → list[DocumentChunk].
+  3. `AnswerAgent.astream` → yield text tokens.
+  4. Wrap each step into an SSE-style event dict so the API endpoint can
+     stream them as Server-Sent Events.
+Phase 1's chat endpoint (`src/api/v1/chat.py`) is intentionally NOT touched
+in this PR. PR7 cleanup will rewire it to call `ChatHandler.handle(...)`.
+All dependencies are injectable for tests. Default constructors lazy-build
+production deps (no `Settings()` triggered at import time as long as you
+inject mocks).
+"""
+from __future__ import annotations
+from collections.abc import AsyncIterator
+from typing import TYPE_CHECKING, Any
+from langchain_core.messages import BaseMessage
+from src.middlewares.logging import get_logger
+from .answer_agent import AnswerAgent, DocumentChunk
+from .intent_router import IntentRouter
+if TYPE_CHECKING:
+    from ..catalog.reader import CatalogReader
+    from ..query.service import QueryService
+    from ..retrieval.document import DocumentRetriever
+logger = get_logger("chat_handler")
+class ChatHandler:
+    """Top-level chat orchestrator.
+    Returns an `AsyncIterator[dict]` of SSE-style events with shape
+    `{"event": <name>, "data": <str>}`. Event types:
+      - `intent`  — emitted once after classification (JSON-encoded decision)
+      - `chunk`   — text fragment of the streaming answer (one per token)
+      - `done`    — end of stream (data is empty string)
+      - `error`   — failure; data is a user-facing message
+    """
+    def __init__(
+        self,
+        intent_router: IntentRouter | None = None,
+        answer_agent: AnswerAgent | None = None,
+        catalog_reader: CatalogReader | None = None,
+        query_service: QueryService | None = None,
+        document_retriever: DocumentRetriever | None = None,
+    ) -> None:
+        self._intent_router = intent_router
+        self._answer_agent = answer_agent
+        self._catalog_reader = catalog_reader
+        self._query_service = query_service
+        self._document_retriever = document_retriever
+    # ------------------------------------------------------------------
+    # Lazy default-dep builders
+    # ------------------------------------------------------------------
+    def _get_intent_router(self) -> IntentRouter:
+        if self._intent_router is None:
+            self._intent_router = IntentRouter()
+        return self._intent_router
+    def _get_answer_agent(self) -> AnswerAgent:
+        if self._answer_agent is None:
+            self._answer_agent = AnswerAgent()
+        return self._answer_agent
+    def _get_catalog_reader(self) -> CatalogReader:
+        if self._catalog_reader is None:
+            from ..catalog.reader import CatalogReader
+            from ..catalog.store import CatalogStore
+            self._catalog_reader = CatalogReader(CatalogStore())
+        return self._catalog_reader
+    def _get_query_service(self) -> QueryService:
+        if self._query_service is None:
+            from ..query.service import QueryService
+            self._query_service = QueryService()
+        return self._query_service
+    def _get_document_retriever(self) -> DocumentRetriever:
+        if self._document_retriever is None:
+            from ..retrieval.document import DocumentRetriever
+            self._document_retriever = DocumentRetriever()
+        return self._document_retriever
+    # ------------------------------------------------------------------
+    # Public entry
+    # ------------------------------------------------------------------
+    async def handle(
+        self,
+        message: str,
+        user_id: str,
+        history: list[BaseMessage] | None = None,
+    ) -> AsyncIterator[dict[str, Any]]:
+        # ---- 1. Classify intent --------------------------------------
+        try:
+            decision = await self._get_intent_router().classify(message, history)
+        except Exception as e:
+            logger.error("intent classification failed", error=str(e))
+            yield {"event": "error", "data": f"Could not classify message: {e}"}
+            return
+        yield {"event": "intent", "data": decision.model_dump_json()}
+        rewritten = decision.rewritten_query or message
+        query_result = None
+        chunks: list[DocumentChunk] | None = None
+        # ---- 2. Route ------------------------------------------------
+        if decision.source_hint == "structured":
+            try:
+                catalog = await self._get_catalog_reader().read(user_id, "structured")
+                query_result = await self._get_query_service().run(
+                    user_id, rewritten, catalog
+                )
+            except Exception as e:
+                logger.error(
+                    "structured route failed",
+                    user_id=user_id,
+                    error=str(e),
+                )
+                yield {"event": "error", "data": f"Structured query failed: {e}"}
+                return
+        elif decision.source_hint == "unstructured":
+            try:
+                raw_chunks = await self._get_document_retriever().retrieve(
+                    rewritten, user_id
+                )
+                chunks = _normalize_chunks(raw_chunks)
+            except NotImplementedError:
+                logger.warning("DocumentRetriever placeholder hit", user_id=user_id)
+                yield {
+                    "event": "error",
+                    "data": "Document retrieval is not yet available — pending implementation.",
+                }
+                return
+            except Exception as e:
+                logger.error(
+                    "unstructured route failed", user_id=user_id, error=str(e)
+                )
+                yield {"event": "error", "data": f"Document retrieval failed: {e}"}
+                return
+        # else: chat path — no context
+        # ---- 3. Stream answer ----------------------------------------
+        try:
+            async for token in self._get_answer_agent().astream(
+                message,
+                history=history,
+                query_result=query_result,
+                chunks=chunks,
+            ):
+                yield {"event": "chunk", "data": token}
+        except Exception as e:
+            logger.error("answer streaming failed", user_id=user_id, error=str(e))
+            yield {"event": "error", "data": f"Answer generation failed: {e}"}
+            return
+        yield {"event": "done", "data": ""}
+def _normalize_chunks(raw: Any) -> list[DocumentChunk]:
+    """Convert whatever the retriever returns into list[DocumentChunk].
+    The Phase 2 `DocumentRetriever.retrieve` interface is a stub today;
+    when TAB owner ships it, it should return `list[DocumentChunk]`
+    directly so this normalizer becomes a no-op. Until then we coerce
+    common shapes (dict-with-content, plain string) defensively.
+    """
+    if not raw:
+        return []
+    if isinstance(raw, list) and all(isinstance(c, DocumentChunk) for c in raw):
+        return raw
+    chunks: list[DocumentChunk] = []
+    for item in raw:
+        if isinstance(item, DocumentChunk):
+            chunks.append(item)
+        elif isinstance(item, dict):
+            chunks.append(
+                DocumentChunk(
+                    content=str(item.get("content", "")),
+                    filename=item.get("filename"),
+                    page_label=item.get("page_label"),
+                )
+            )
+        elif isinstance(item, str):
+            chunks.append(DocumentChunk(content=item))
+    return chunks

src/agents/intent_router.py CHANGED Viewed

@@ -1,24 +1,107 @@
 """IntentRouter — classifies a user message and emits source_hint.
-Output: needs_search (bool) + source_hint ∈ { chat, unstructured, structured }.
 Replaces the previous orchestration.py once the chat endpoint is rewired.
 """
-from dataclasses import dataclass
 from typing import Literal
 SourceHint = Literal["chat", "unstructured", "structured"]
-@dataclass
-class IntentRouterDecision:
-    needs_search: bool
-    source_hint: SourceHint
-    rewritten_query: str | None = None
 class IntentRouter:
-    async def classify(self, message: str) -> IntentRouterDecision:
-        raise NotImplementedError

 """IntentRouter — classifies a user message and emits source_hint.
+Output: needs_search (bool) + source_hint ∈ { chat, unstructured, structured }
++ rewritten_query (standalone form of the user's question, history-resolved).
 Replaces the previous orchestration.py once the chat endpoint is rewired.
+The default LLM is constructed lazily so the module is import-safe even
+without `.env` populated.
 """
+from __future__ import annotations
+from pathlib import Path
 from typing import Literal
+from langchain_core.messages import BaseMessage
+from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
+from langchain_core.runnables import Runnable
+from langchain_openai import AzureChatOpenAI
+from pydantic import BaseModel, Field
+from src.middlewares.logging import get_logger
+logger = get_logger("intent_router")
 SourceHint = Literal["chat", "unstructured", "structured"]
+_PROMPT_PATH = (
+    Path(__file__).resolve().parent.parent
+    / "config"
+    / "prompts"
+    / "intent_router.md"
+)
+class IntentRouterDecision(BaseModel):
+    """LLM output. Pydantic so it can be used with `with_structured_output`."""
+    needs_search: bool = Field(
+        ..., description="True if we must look at the user's data to answer."
+    )
+    source_hint: SourceHint = Field(
+        ...,
+        description="Which downstream path: 'chat' (no lookup), "
+        "'unstructured' (PDF/DOCX/TXT prose), 'structured' (DB / tabular file).",
+    )
+    rewritten_query: str | None = Field(
+        None,
+        description="Standalone version of the question, history-resolved. "
+        "Null when needs_search=false.",
+    )
+def _load_prompt_text() -> str:
+    return _PROMPT_PATH.read_text(encoding="utf-8")
+def _build_default_chain() -> Runnable:
+    from src.config.settings import settings
+    llm = AzureChatOpenAI(
+        azure_deployment=settings.azureai_deployment_name_4o,
+        openai_api_version=settings.azureai_api_version_4o,
+        azure_endpoint=settings.azureai_endpoint_url_4o,
+        api_key=settings.azureai_api_key_4o,
+        temperature=0,
+    )
+    prompt = ChatPromptTemplate.from_messages(
+        [
+            ("system", _load_prompt_text()),
+            MessagesPlaceholder(variable_name="history", optional=True),
+            ("human", "{message}"),
+        ]
+    )
+    return prompt | llm.with_structured_output(IntentRouterDecision)
 class IntentRouter:
+    """Classifies a user message into chat / unstructured / structured.
+    Inject `structured_chain` for tests; default builds the production
+    Azure OpenAI chain on first use.
+    """
+    def __init__(self, structured_chain: Runnable | None = None) -> None:
+        self._chain = structured_chain
+    def _ensure_chain(self) -> Runnable:
+        if self._chain is None:
+            self._chain = _build_default_chain()
+        return self._chain
+    async def classify(
+        self,
+        message: str,
+        history: list[BaseMessage] | None = None,
+    ) -> IntentRouterDecision:
+        chain = self._ensure_chain()
+        decision: IntentRouterDecision = await chain.ainvoke(
+            {"message": message, "history": history or []}
+        )
+        logger.info(
+            "intent classified",
+            source_hint=decision.source_hint,
+            needs_search=decision.needs_search,
+        )
+        return decision

src/config/prompts/chatbot_system.md CHANGED Viewed

@@ -1,17 +1,32 @@
-# Chatbot System Prompt
-Final answer-formation step. Receives the user question, retrieval context (Cu)
-or query results (from QueryExecutor), and produces the natural-language answer.
-Used by `src/agents/chatbot.py`. SSE-streamed back to the user.
-## System prompt
-(content to be ported from src/config/agents/system_prompt.md, then rewritten
-to remove references to the old retrieval pipeline and reflect catalog-driven flow)
-## Notes
-- Keep answers grounded in the provided context — no hallucination.
-- For tabular results, format numbers with appropriate units (from `column.unit` when available).
-- Cite sources when applicable (filename + page label).

+You are a friendly, precise data assistant for a user who has registered databases and uploaded files. Your job is to answer the user's questions using **only** the data context provided to you in this turn.
+## Rules
+1. **Ground every claim in the provided context.** If the context doesn't contain the answer, say so plainly — do not guess. Never invent numbers, dates, or facts that aren't in the result rows or document chunks.
+2. **Be concise.** Default to 1–4 sentences. Bullet lists when comparing items. A small table when more than ~5 rows of data carry the answer.
+3. **Use the user's terms when possible.** Mirror the column / table names they care about, but feel free to humanize ("revenue" instead of "total_cents", "last month" instead of "2026-04 timestamps").
+4. **Reference the source.** When you cite a number from a query result, mention the source briefly (e.g., "from your prod_db `orders` table"). When you quote a document, cite the filename and page if available.
+5. **Stream coherently.** You are streaming token-by-token; don't backtrack or self-correct mid-answer. Plan the structure mentally before the first token.
+6. **Markdown is OK** for emphasis and small tables, but avoid heavy formatting (code fences, headers) unless the question genuinely calls for it.
+## Context shapes you'll see
+- **Query result** — emitted when the user asked a data question that ran successfully. Contains `rows` (a list of dicts), `row_count`, the source/table that was queried, and any error string. If `error` is set, explain the failure plainly and suggest a next step.
+- **Document chunks** — emitted when the user asked about uploaded prose. Each chunk has source filename and (for PDFs) a page label.
+- **No context** — emitted for greetings, farewells, or meta questions. Just respond conversationally.
+## When the query failed
+If `query_result.error` is non-empty:
+- Acknowledge the failure briefly.
+- Surface the user-actionable part of the error (e.g., "I couldn't find a matching column" → suggest they rephrase).
+- Do not paste raw stack traces or internal IDs.
+## What you do NOT do
+- Speculate beyond the data.
+- Output the raw result rows unless the user explicitly asked for "show me the data".
+- Repeat the user's question back at them.
+- Apologize repeatedly.
+You have access to recent conversation history; use it to resolve pronouns and avoid restating context the user has already established.

src/config/prompts/guardrails.md CHANGED Viewed

@@ -1,12 +1,11 @@
-# Guardrails
-Safety / refusal / scope-bounding rules applied to all LLM calls.
-(content to be ported from src/config/agents/guardrails_prompt.md)
-## Scope
-- Refuse PII extraction requests
-- Refuse questions outside the user's data scope
-- Refuse code-execution / shell-style requests
-- (more to be added)

+## Guardrails
+These rules apply to every response, regardless of the system prompt above. They take precedence when in conflict with anything else.
+1. **Stay within the user's data scope.** Refuse questions that ask you to fabricate data, predict the future from data the user hasn't shared, or answer questions unrelated to the user's registered sources. Reply briefly: "That's outside what I can answer from your data — I can only work with the sources you've registered."
+2. **Do not reveal or extract PII.** If the data context contains a PII column (it will be flagged), do not list raw values — describe distributions or counts only. If the user explicitly asks for raw PII, refuse: "I can't surface that column's contents directly."
+3. **No code execution, no shell commands, no file writes.** If the user asks you to run code, modify their data, or perform a write operation, refuse: "I can only read and summarize — I don't execute code or change your data."
+4. **No credentials, no secrets.** Never repeat connection strings, passwords, API keys, or service-account JSON, even if they somehow appear in context.
+5. **No medical / legal / financial advice.** If the user asks "should I…" questions about a regulated domain, defer: "I can show you what the data says, but the decision is yours — I won't give advice in this domain."
+6. **Acknowledge limits when relevant.** If a result was truncated, say so. If you're not sure, say so. Avoid the appearance of false certainty.
+7. **Be honest about errors.** If the query failed, the document was missing, or the catalog had nothing relevant, say it plainly. Do not paper over with vague answers.

src/config/prompts/intent_router.md CHANGED Viewed

@@ -1,25 +1,66 @@
-# Intent Router Prompt
-Classifies a user message into:
-- `needs_search`: bool
-- `source_hint`: `chat` | `unstructured` | `structured`
-Used by `src/agents/intent_router.py`.
-## System prompt
-(to be written)
-## Output schema
-```json
-{
-  "needs_search": true,
-  "source_hint": "structured",
-  "rewritten_query": "..."
-}
-```
 ## Few-shot examples
-(to be written)

+You are the intent router for an AI data assistant. Given a user's latest message (and optionally recent conversation history), decide which downstream path should handle it.
+## Output
+Return three fields:
+- **`needs_search`** — `true` if we must look at the user's data to answer; `false` for greetings, farewells, off-topic chitchat, or meta questions about the assistant itself.
+- **`source_hint`** — one of:
+  - `chat` — no data lookup needed (greetings, farewells, generic small talk).
+  - `unstructured` — the user is asking about the **content** of an uploaded document (PDF / DOCX / TXT).
+  - `structured` — the user is asking a **data question** answerable from a database or a tabular file (CSV / XLSX / Parquet). This includes counts, sums, top-N, filters, comparisons, trends, joins across registered structured sources.
+- **`rewritten_query`** — a **standalone** version of the user's question that incorporates necessary context from history. If the original message is already standalone, return it unchanged. If `needs_search` is `false`, leave this empty/null.
+## Routing rules
+1. If the message is a pure greeting / farewell / thanks / "how are you" / "what can you do" → `chat` + `needs_search=false`.
+2. If the message references content that lives in a registered DB or uploaded tabular file (sales numbers, customer counts, order trends, sheet rows, table columns) → `structured` + `needs_search=true`.
+3. If the message asks about prose content (a section of a PDF, what a memo says, a quote from a document) → `unstructured` + `needs_search=true`.
+4. If the message is ambiguous between structured and unstructured, prefer `structured` — the planner can fall back if the catalog has nothing relevant.
+5. Cross-source comparison ("compare DB sales to the customers.csv file") → `structured`. The planner sees both source types in one prompt and can correlate.
+## Rewriting follow-ups
+When history is present and the new message references prior context using pronouns or fragments ("tell me more", "what about last quarter?", "and by region?"), expand the rewritten_query into a fully standalone question. Example:
+  History: "What was our top product last month?" → "Pro Plan Annual at $487k"
+  Message: "How does that compare to Q1?"
+  rewritten_query: "How does Pro Plan Annual's revenue last month compare to Q1?"
+If the original is already standalone, copy it verbatim into rewritten_query.
 ## Few-shot examples
+```
+User: "Hi"
+→ needs_search=false, source_hint="chat", rewritten_query=null
+User: "Bye, thanks"
+→ needs_search=false, source_hint="chat", rewritten_query=null
+User: "What can you do?"
+→ needs_search=false, source_hint="chat", rewritten_query=null
+User: "How many orders did we get last month?"
+→ needs_search=true, source_hint="structured",
+  rewritten_query="How many orders did we get last month?"
+User: "What does the Q1 board memo say about churn?"
+→ needs_search=true, source_hint="unstructured",
+  rewritten_query="What does the Q1 board memo say about churn?"
+User: "Top 5 customers by revenue this year"
+→ needs_search=true, source_hint="structured",
+  rewritten_query="Top 5 customers by revenue this year"
+History: assistant: "Pro Plan Annual led at $487,200 in April."
+User: "And in March?"
+→ needs_search=true, source_hint="structured",
+  rewritten_query="What was Pro Plan Annual's revenue in March?"
+```
+## Constraints
+- Do not invent data. If you don't know whether a topic exists in the user's data, route to `structured` and let the planner decide.
+- Do not refuse — refusal happens later in guardrails. Just classify.
+- One JSON object as output; no prose, no markdown.

src/config/prompts/query_planner.md CHANGED Viewed

@@ -1,25 +1,151 @@
-# Query Planner Prompt
-Takes a user question + their data catalog (Cs ∪ Ct).
-Produces a JSON IR that describes the query intent.
-See ARCHITECTURE.md §7 for the IR schema. Used by `src/query/planner/service.py`.
-## System prompt
-(to be written)
 ## Output schema
-Strict JSON matching `src/query/ir/models.py:QueryIR`.
-Validated by `IRValidator` against the catalog before reaching the compiler.
 ## Few-shot examples
-(to be written — 5–10 examples covering filter + groupby + agg + sort + limit)
-## Notes
-- Reference columns by `column_id`, not `name`.
-- `value_type` must match the column's `data_type`.
-- Only emit operators/aggs from the whitelist (`src/query/ir/operators.py`).

+You are the **query planner** for an AI data assistant. Given a user's question and the user's full data catalog, produce a structured **JSON IR** that captures the query intent.
+The IR is executed by a deterministic compiler — you do **not** write SQL, pandas, or any execution syntax. You produce intent only.
+## What you receive
+1. The user's question.
+2. The user's catalog: every registered source (databases and tabular files), every table, every column, with descriptions, sample values, stats, and foreign keys. Each item carries a stable identifier (`source_id`, `table_id`, `column_id`) — copy these verbatim into the IR.
 ## Output schema
+A `QueryIR` object:
+```jsonc
+{
+  "ir_version": "1.0",
+  "source_id":  "...",            // pick from catalog
+  "table_id":   "...",            // pick from chosen source
+  "select": [
+    {"kind": "column", "column_id": "...", "alias": "..."},
+    {"kind": "agg",    "fn": "count|count_distinct|sum|avg|min|max",
+                       "column_id": "...?", "alias": "..."}
+  ],
+  "filters": [
+    {"column_id": "...",
+     "op":    "= | != | < | <= | > | >= | in | not_in | is_null | is_not_null | like | between",
+     "value": ...,
+     "value_type": "int|decimal|string|datetime|date|bool"}
+  ],
+  "group_by": ["column_id", ...],
+  "order_by": [{"column_id": "...", "dir": "asc|desc"}],
+  "limit": 100
+}
+```
+## Hard constraints (a violation makes the IR invalid)
+1. `source_id`, `table_id`, `column_id` must come **verbatim** from the catalog. Never invent IDs or copy table/column **names** in their place.
+2. **Single-table only in v1.** Pick the table whose columns best answer the question. If the question genuinely needs a join, pick the table that yields the most useful answer alone and the user can refine.
+3. Use only listed operators / aggregates. No window functions, no `CASE WHEN`, no subqueries — those are not part of v1.
+4. `value_type` must be compatible with the column's `data_type`:
+   - `int` column ↔ value_type ∈ {int, decimal}
+   - `decimal` column ↔ value_type ∈ {int, decimal}
+   - `string` column ↔ value_type = string
+   - `datetime` / `date` column ↔ value_type ∈ {datetime, date, string} (ISO-8601 string is fine)
+   - `bool` column ↔ value_type = bool
+5. `limit` between 1 and 10000 inclusive.
+6. For `count` of all rows, omit `column_id` from the agg item. For any other aggregate, `column_id` is required.
+7. `order_by.column_id` may reference either a real column_id or an alias declared in `select`.
+8. For `is_null` / `is_not_null`, `value` and `value_type` are still emitted but ignored — pick reasonable defaults.
+9. For `in` / `not_in`, `value` is a JSON list. For `between`, `value` is a JSON list of exactly two elements (low, high).
+## Style guidance
+- Default `limit` to 100 unless the user asked for "top N" (then use N) or said "all" (then leave out `limit`, server will cap at 10000).
+- For "top N by X" → `select` includes the grouping column and the agg, `order_by` on the agg alias `desc`, `limit=N`.
+- For "how many ..." → `select=[{"kind":"agg","fn":"count","alias":"n"}]` plus filters; no group_by.
+- Prefer aliases on aggregates (`alias="total"`, `alias="n"`, etc.) so the answer-formatter has a clean column name.
+- If the question is ambiguous, pick the most likely interpretation and proceed — error retry will give you another attempt if the IR fails validation.
 ## Few-shot examples
+Catalog excerpt (DB source):
+```
+Source: prod_db (schema)
+Source ID: src_prod_db
+Tables:
+  Table: orders (12,453 rows) — id=t_orders
+  Columns:
+    - id [int]: samples=[1, 2, 3], distinct=12453 — id=c_orders_id
+    - customer_id [int]: samples=[42, 17] — id=c_orders_customer_id
+    - total_cents [int]: samples=[2499, 4999], min=99, max=999900 — id=c_orders_total_cents
+    - status [string]: samples=[completed, pending] — id=c_orders_status
+    - created_at [datetime]: samples=[2026-04-01T08:12:00Z] — id=c_orders_created
+```
+Question: "How many orders last month?"
+Output:
+```json
+{
+  "ir_version": "1.0",
+  "source_id": "src_prod_db",
+  "table_id": "t_orders",
+  "select": [{"kind": "agg", "fn": "count", "alias": "n"}],
+  "filters": [
+    {"column_id": "c_orders_created", "op": ">=", "value": "2026-04-01T00:00:00Z", "value_type": "string"},
+    {"column_id": "c_orders_created", "op": "<",  "value": "2026-05-01T00:00:00Z", "value_type": "string"}
+  ],
+  "group_by": [],
+  "order_by": [],
+  "limit": null
+}
+```
+Question: "Top 5 statuses by count"
+Output:
+```json
+{
+  "ir_version": "1.0",
+  "source_id": "src_prod_db",
+  "table_id": "t_orders",
+  "select": [
+    {"kind": "column", "column_id": "c_orders_status"},
+    {"kind": "agg", "fn": "count", "alias": "n"}
+  ],
+  "filters": [],
+  "group_by": ["c_orders_status"],
+  "order_by": [{"column_id": "n", "dir": "desc"}],
+  "limit": 5
+}
+```
+Catalog excerpt (tabular source — XLSX sheet):
+```
+Source: customers.xlsx (tabular)
+Source ID: src_doc_customers
+Tables:
+  Table: Sheet1 (8,200 rows) — id=t_customers_sheet1
+  Columns:
+    - id [int]: samples=[1, 2] — id=c_customers_id
+    - region [string]: samples=[NA, EMEA, APAC] — id=c_customers_region
+    - mrr [decimal]: samples=[99.0, 199.0], min=0.0, max=999.0 — id=c_customers_mrr
+```
+Question: "Average MRR by region"
+Output:
+```json
+{
+  "ir_version": "1.0",
+  "source_id": "src_doc_customers",
+  "table_id": "t_customers_sheet1",
+  "select": [
+    {"kind": "column", "column_id": "c_customers_region"},
+    {"kind": "agg", "fn": "avg", "column_id": "c_customers_mrr", "alias": "avg_mrr"}
+  ],
+  "filters": [],
+  "group_by": ["c_customers_region"],
+  "order_by": [{"column_id": "avg_mrr", "dir": "desc"}],
+  "limit": 100
+}
+```
+## Retry behavior
+If the previous attempt's IR failed validation, the error message will be appended below. Read it carefully and emit a corrected IR — do not repeat the same mistake.

src/query/executor/dispatcher.py CHANGED Viewed

@@ -2,16 +2,76 @@
 This is the only place in the structured query path where the schema/tabular
 distinction matters. Every step before this is source-type-agnostic.
 """
-from ...catalog.models import Catalog
 from ..ir.models import QueryIR
 from .base import BaseExecutor
 class ExecutorDispatcher:
-    def __init__(self, catalog: Catalog) -> None:
         self._catalog = catalog
     def pick(self, ir: QueryIR) -> BaseExecutor:
-        raise NotImplementedError

 This is the only place in the structured query path where the schema/tabular
 distinction matters. Every step before this is source-type-agnostic.
+Production executors are imported lazily so the module is import-safe for
+tests (DbExecutor transitively imports `Settings` which fails without `.env`).
+Tests can inject their own `executor_factories` to bypass production deps
+entirely.
+Until TAB owner ships the real `TabularExecutor` body, dispatching to a
+tabular source returns the existing stub which raises `NotImplementedError`
+on `.run()`. `QueryService` catches this and surfaces a graceful error in
+`QueryResult.error`.
 """
+from __future__ import annotations
+from collections.abc import Callable
+from ...catalog.models import Catalog, Source
 from ..ir.models import QueryIR
 from .base import BaseExecutor
+ExecutorFactory = Callable[[Catalog], BaseExecutor]
 class ExecutorDispatcher:
+    """Picks the right `BaseExecutor` for an IR.
+    One executor instance per source_type per dispatcher (cached internally),
+    since both `DbExecutor` and `TabularExecutor` are stateless beyond the
+    catalog they hold.
+    """
+    def __init__(
+        self,
+        catalog: Catalog,
+        executor_factories: dict[str, ExecutorFactory] | None = None,
+    ) -> None:
         self._catalog = catalog
+        self._factories = executor_factories
+        self._cache: dict[str, BaseExecutor] = {}
     def pick(self, ir: QueryIR) -> BaseExecutor:
+        source = self._find_source(ir.source_id)
+        if source.source_type in self._cache:
+            return self._cache[source.source_type]
+        factory = self._get_factory(source.source_type)
+        executor = factory(self._catalog)
+        self._cache[source.source_type] = executor
+        return executor
+    def _get_factory(self, source_type: str) -> ExecutorFactory:
+        if self._factories is not None:
+            factory = self._factories.get(source_type)
+            if factory is None:
+                raise ValueError(
+                    f"no executor factory injected for source_type={source_type!r}"
+                )
+            return factory
+        # Default factories — lazy-imported so importing this module is cheap
+        if source_type == "schema":
+            from .db import DbExecutor
+            return DbExecutor  # type: ignore[return-value]
+        if source_type == "tabular":
+            from .tabular import TabularExecutor
+            return TabularExecutor  # type: ignore[return-value]
+        raise ValueError(f"unsupported source_type={source_type!r}")
+    def _find_source(self, source_id: str) -> Source:
+        for s in self._catalog.sources:
+            if s.source_id == source_id:
+                return s
+        raise ValueError(f"source_id {source_id!r} not in catalog")

src/query/planner/prompt.py CHANGED Viewed

@@ -2,11 +2,48 @@
 Renders the catalog into a compact textual form that fits the LLM context
 window. For users with ≤50 tables the full catalog goes in verbatim.
 """
 from ...catalog.models import Catalog
-def build_planner_prompt(question: str, catalog: Catalog) -> str:
-    """Return the full prompt string to feed the planner LLM."""
-    raise NotImplementedError

 Renders the catalog into a compact textual form that fits the LLM context
 window. For users with ≤50 tables the full catalog goes in verbatim.
+Reuses `catalog.enricher.render_source` so the planner sees the same
+source-rendering format as the enricher does at ingestion time — keeping
+catalog descriptions consistent across both LLM call sites.
 """
+from __future__ import annotations
+from ...catalog.enricher import render_source
 from ...catalog.models import Catalog
+def render_catalog(catalog: Catalog) -> str:
+    """Render every Source in the catalog as text. One blank line between sources."""
+    if not catalog.sources:
+        return "(catalog is empty — the user has not registered any structured data yet)"
+    return "\n\n".join(render_source(s) for s in catalog.sources)
+def build_planner_prompt(
+    question: str,
+    catalog: Catalog,
+    previous_error: str | None = None,
+) -> str:
+    """Return the human-message content for the planner LLM.
+    Composed of three sections in order:
+      1. The user's question.
+      2. The user's full catalog (rendered).
+      3. (optional) The previous attempt's error, on retry.
+    The system prompt (`config/prompts/query_planner.md`) is loaded
+    separately by `QueryPlannerService`.
+    """
+    sections = [
+        f"# Question\n\n{question}",
+        f"# Catalog\n\n{render_catalog(catalog)}",
+    ]
+    if previous_error:
+        sections.append(
+            "# Previous attempt failed validation\n\n"
+            f"{previous_error}\n\n"
+            "Emit a corrected IR. Do not repeat the same mistake."
+        )
+    return "\n\n".join(sections)

src/query/planner/service.py CHANGED Viewed

@@ -1,15 +1,90 @@
 """QueryPlannerService — single LLM call: question + catalog → JSON IR.
-Prompt: src/config/prompts/query_planner.md
-Output: a QueryIR ready for the IRValidator.
 """
 from ...catalog.models import Catalog
 from ..ir.models import QueryIR
 class QueryPlannerService:
-    """Wraps the LLM call with structured-output parsing into QueryIR."""
-    async def plan(self, question: str, catalog: Catalog) -> QueryIR:
-        raise NotImplementedError

 """QueryPlannerService — single LLM call: question + catalog → JSON IR.
+Prompt: src/config/prompts/query_planner.md (system) + the human content
+built by `prompt.build_planner_prompt(...)`.
+Output: a QueryIR ready for the IRValidator. Validation + retry are the
+caller's concern (`QueryService` orchestrates that loop).
 """
+from __future__ import annotations
+from pathlib import Path
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.runnables import Runnable
+from langchain_openai import AzureChatOpenAI
+from src.middlewares.logging import get_logger
 from ...catalog.models import Catalog
 from ..ir.models import QueryIR
+from .prompt import build_planner_prompt
+logger = get_logger("query_planner")
+_PROMPT_PATH = (
+    Path(__file__).resolve().parent.parent.parent
+    / "config"
+    / "prompts"
+    / "query_planner.md"
+)
+def _load_prompt_text() -> str:
+    return _PROMPT_PATH.read_text(encoding="utf-8")
+def _build_default_chain() -> Runnable:
+    from src.config.settings import settings
+    llm = AzureChatOpenAI(
+        azure_deployment=settings.azureai_deployment_name_4o,
+        openai_api_version=settings.azureai_api_version_4o,
+        azure_endpoint=settings.azureai_endpoint_url_4o,
+        api_key=settings.azureai_api_key_4o,
+        temperature=0,
+    )
+    prompt = ChatPromptTemplate.from_messages(
+        [
+            ("system", _load_prompt_text()),
+            ("human", "{human_content}"),
+        ]
+    )
+    return prompt | llm.with_structured_output(QueryIR)
 class QueryPlannerService:
+    """Wraps the LLM call with structured-output parsing into QueryIR.
+    Inject `structured_chain` for tests. The planner prompt is composed
+    by `build_planner_prompt(question, catalog, previous_error)` so retry
+    callers can append the prior error context to nudge the LLM.
+    """
+    def __init__(self, structured_chain: Runnable | None = None) -> None:
+        self._chain = structured_chain
+    def _ensure_chain(self) -> Runnable:
+        if self._chain is None:
+            self._chain = _build_default_chain()
+        return self._chain
+    async def plan(
+        self,
+        question: str,
+        catalog: Catalog,
+        previous_error: str | None = None,
+    ) -> QueryIR:
+        human_content = build_planner_prompt(question, catalog, previous_error)
+        chain = self._ensure_chain()
+        ir: QueryIR = await chain.ainvoke({"human_content": human_content})
+        logger.info(
+            "query planned",
+            source_id=ir.source_id,
+            table_id=ir.table_id,
+            select_n=len(ir.select),
+            filters_n=len(ir.filters),
+            retry=previous_error is not None,
+        )
+        return ir

src/query/service.py CHANGED Viewed

@@ -2,14 +2,134 @@
 Top-level entry point for catalog-driven structured queries. Wired into
 the chat endpoint when source_hint == "structured".
 """
 from ..catalog.models import Catalog
 from .executor.base import QueryResult
 class QueryService:
-    """End-to-end runner for a user question against a catalog."""
     async def run(self, user_id: str, question: str, catalog: Catalog) -> QueryResult:
-        raise NotImplementedError

 Top-level entry point for catalog-driven structured queries. Wired into
 the chat endpoint when source_hint == "structured".
+Flow per call:
+  1. Plan (LLM): question + catalog → QueryIR
+  2. Validate IR against catalog. On failure, re-prompt the planner with the
+     error context and retry (up to `max_retries` total attempts).
+  3. Dispatch IR to the right executor by `source.source_type`.
+  4. Execute. Any exception (including NotImplementedError from the
+     TabularExecutor placeholder) is caught and surfaced via
+     `QueryResult.error` so the chatbot can branch on success / failure.
+The service never raises — every code path returns a `QueryResult`.
 """
+from __future__ import annotations
+from collections.abc import Callable
+from src.middlewares.logging import get_logger
 from ..catalog.models import Catalog
 from .executor.base import QueryResult
+from .executor.dispatcher import ExecutorDispatcher
+from .ir.validator import IRValidationError, IRValidator
+from .planner.service import QueryPlannerService
+logger = get_logger("query_service")
 class QueryService:
+    """End-to-end runner for a user question against a catalog.
+    All heavy dependencies are injectable so unit tests don't need real
+    LLMs or DB engines. Default constructors lazy-build the production
+    deps so importing this module is side-effect-free.
+    """
+    def __init__(
+        self,
+        planner: QueryPlannerService | None = None,
+        validator: IRValidator | None = None,
+        dispatcher_factory: Callable[[Catalog], ExecutorDispatcher] | None = None,
+        max_retries: int = 3,
+    ) -> None:
+        self._planner = planner or QueryPlannerService()
+        self._validator = validator or IRValidator()
+        self._dispatcher_factory = dispatcher_factory or ExecutorDispatcher
+        self._max_retries = max(1, max_retries)
     async def run(self, user_id: str, question: str, catalog: Catalog) -> QueryResult:
+        if not catalog.sources:
+            return _error_result(
+                source_id="",
+                error="No structured data registered yet — connect a database "
+                "or upload a CSV/XLSX before asking data questions.",
+            )
+        # ---------- planner + validator with retry ------------------
+        previous_error: str | None = None
+        ir = None
+        for attempt in range(1, self._max_retries + 1):
+            try:
+                ir = await self._planner.plan(question, catalog, previous_error)
+            except Exception as e:
+                logger.error("planner crashed", attempt=attempt, error=str(e))
+                return _error_result(source_id="", error=f"planner failed: {e}")
+            try:
+                self._validator.validate(ir, catalog)
+                logger.info(
+                    "ir planned and validated",
+                    attempt=attempt,
+                    source_id=ir.source_id,
+                    table_id=ir.table_id,
+                )
+                break
+            except IRValidationError as e:
+                previous_error = str(e)
+                logger.warning(
+                    "ir validation failed",
+                    attempt=attempt,
+                    error=previous_error,
+                )
+                ir = None  # discard invalid IR
+                continue
+        else:
+            return _error_result(
+                source_id="",
+                error=(
+                    f"Planner could not produce a valid IR after "
+                    f"{self._max_retries} attempts. Last error: {previous_error}"
+                ),
+            )
+        # `ir` is non-None and valid here (guarded by the for/else above)
+        assert ir is not None
+        # ---------- dispatch + execute ------------------------------
+        try:
+            dispatcher = self._dispatcher_factory(catalog)
+            executor = dispatcher.pick(ir)
+        except Exception as e:
+            logger.error("dispatch failed", source_id=ir.source_id, error=str(e))
+            return _error_result(source_id=ir.source_id, error=f"dispatch failed: {e}")
+        try:
+            return await executor.run(ir)
+        except NotImplementedError as e:
+            # TabularExecutor placeholder — TAB owner ships PR3-TAB
+            logger.warning(
+                "executor not yet implemented",
+                source_id=ir.source_id,
+                error=str(e),
+            )
+            return _error_result(
+                source_id=ir.source_id,
+                error="Tabular execution is not yet available — pending PR3-TAB.",
+            )
+        except Exception as e:
+            logger.error("executor crashed", source_id=ir.source_id, error=str(e))
+            return _error_result(
+                source_id=ir.source_id, error=f"executor failed: {e}"
+            )
+def _error_result(source_id: str, error: str) -> QueryResult:
+    """Build a uniform error QueryResult.
+    `backend` is intentionally empty when the failure happens before an
+    executor is picked — the caller can still distinguish via `error`.
+    """
+    return QueryResult(source_id=source_id, backend="", error=error)