Initial commit: SIC harness (backend, web, pi-adapter, configs, docs)

- pnpm monorepo: apps/api (Fastify + SQLite + SSE), apps/web (React+Vite), packages/shared, packages/pi-adapter - Local auth (admin/webhook-runner roles) + Keycloak JWT ready - Multi-session chat with reliable history (user persisted before LLM, assistant persisted after stream) - Markdown knowledge base with /api/docs/search + /api/docs/:id - YAML webhook catalog with backend-only execution, retry/backoff, audit (webhook_runs), and per-user rate limit - Skills config (sre-on-call, blameless-postmortem, security-incident) injected into LLM system prompt - LLM provider failover chain (config/models.yml fallback + LLM_FALLBACK_CHAIN override) - Context-aware webhooks panel + backend id-mention safety net - Per-message stats (time/duration/tokens/model), Markdown+GFM render, code & table copy/download buttons - Vitest suite, end-to-end smoke test (scripts/smoke.mjs), per-session system prompt override - /metrics Prometheus endpoint + /api/metrics JSON, request-id correlation - dotenv with explicit repo-root path; envString/envNumber helpers (handles empty-string env) - Runbooks + SOPs under knowledge/ in English; README, docs, and INDEX.md in English
2026-06-29 16:20:53 +02:00
commit 62728b2200
89 changed files with 11992 additions and 0 deletions
--- a/docs/agents/api-agent.md
+++ b/docs/agents/api-agent.md
@@ -0,0 +1,17 @@
+# API Agent
+
+Owns the Fastify backend.
+
+## Focus
+
+- Design HTTP/SSE contracts first.
+- Persist every critical state in SQLite.
+- Validate ownership with `session_id + user_id`.
+- Emit JSON logs.
+- Keep `/healthz` and `/readyz` simple.
+
+## Do not
+
+- Do not keep sessions in memory.
+- Do not expose real webhook URLs to clients.
+- Do not execute webhooks without explicit confirmation.
--- a/docs/agents/pi-adapter-agent.md
+++ b/docs/agents/pi-adapter-agent.md
@@ -0,0 +1,14 @@
+# PI Adapter Agent
+
+Owns isolating the `pi.dev` / LLM provider runtime.
+
+## Focus
+
+- Expose a stable contract to the backend.
+- Support OpenAI-compatible providers.
+- Return a structured response: `answer`, `recommended_actions`, `internal_docs`.
+
+## Do not
+
+- Do not mix backend HTTP rules with model logic.
+- Do not let the model execute tools directly in Phase 1.
--- a/docs/agents/security-reliability-agent.md
+++ b/docs/agents/security-reliability-agent.md
@@ -0,0 +1,11 @@
+# Security & Reliability Agent
+
+Owns reviewing isolation, audit, and execution rules.
+
+## Checklist
+
+- Every message query filters by `session_id` AND `user_id`.
+- Every webhook validates roles before being shown and before being executed.
+- Every execution is recorded in `webhook_runs`.
+- The frontend never receives real webhook URLs.
+- No critical state lives only in memory.
--- a/docs/agents/web-agent.md
+++ b/docs/agents/web-agent.md
@@ -0,0 +1,15 @@
+# Web Agent
+
+Owns the React + Vite UI.
+
+## Focus
+
+- Three-column layout: sessions, chat, right panel.
+- Consume SSE from `/api/chat/stream`.
+- Show recommended actions without auto-executing them.
+- Rebuild state from the API, not from local memory as the source of truth.
+
+## Do not
+
+- Do not call webhooks directly from the browser.
+- Do not store tokens or secrets in the frontend.
--- a/docs/observabilidad-prompt.md
+++ b/docs/observabilidad-prompt.md
@@ -0,0 +1,2 @@
+=== PROMPT ===
+=== RESPUESTA ===
--- a/docs/product-definition.md
+++ b/docs/product-definition.md
@@ -0,0 +1,11 @@
+# Short definition
+
+`SIC — Super Incident Commander` is a multi-session web interface for consulting a centralized `pi.dev` engine, with persistent history, simple search over internal documentation, and webhook recommendations that are only executed from the backend after explicit user confirmation.
+
+## Target user
+
+Small team, up to 5 concurrent users.
+
+## Successful MVP
+
+A user opens the UI, creates or resumes a session, asks a question, receives a streamed response, sees related documentation, gets recommended actions, and can execute a confirmed webhook. Everything is persisted and auditable.
--- a/docs/reliable-history.md
+++ b/docs/reliable-history.md
@@ -0,0 +1,57 @@
+# Reliable History
+
+## Goal
+
+Guarantee that the chat history is reconstructible, isolated by user, and consistent even if the backend restarts.
+
+## Mandatory rules
+
+1. Persist the user message before calling the LLM.
+2. Persist the assistant response when the stream finishes.
+3. If the LLM fails, record the failure in metadata or as a controlled error message.
+4. Do not keep critical conversational state in memory.
+5. All session and message queries must filter by `session_id` AND `user_id`.
+6. Webhooks must be audited even when they fail.
+
+## Base tables
+
+```sql
+CREATE TABLE IF NOT EXISTS chat_sessions (
+  id TEXT PRIMARY KEY,
+  user_id TEXT NOT NULL,
+  title TEXT,
+  created_at TEXT NOT NULL,
+  updated_at TEXT NOT NULL
+);
+
+CREATE TABLE IF NOT EXISTS chat_messages (
+  id TEXT PRIMARY KEY,
+  session_id TEXT NOT NULL,
+  user_id TEXT NOT NULL,
+  role TEXT NOT NULL,
+  content TEXT NOT NULL,
+  metadata TEXT,
+  created_at TEXT NOT NULL,
+  FOREIGN KEY (session_id) REFERENCES chat_sessions(id)
+);
+
+CREATE TABLE IF NOT EXISTS webhook_runs (
+  id TEXT PRIMARY KEY,
+  webhook_id TEXT NOT NULL,
+  user_id TEXT NOT NULL,
+  session_id TEXT NOT NULL,
+  status TEXT NOT NULL,
+  request_payload TEXT,
+  response_status INTEGER,
+  created_at TEXT NOT NULL
+);
+```
+
+## Security invariant
+
+```sql
+WHERE session_id = ?
+AND user_id = ?
+```
+
+Without this filter, the query is incorrectly designed.