Initial commit: SIC harness (backend, web, pi-adapter, configs, docs)

- pnpm monorepo: apps/api (Fastify + SQLite + SSE), apps/web (React+Vite), packages/shared, packages/pi-adapter - Local auth (admin/webhook-runner roles) + Keycloak JWT ready - Multi-session chat with reliable history (user persisted before LLM, assistant persisted after stream) - Markdown knowledge base with /api/docs/search + /api/docs/:id - YAML webhook catalog with backend-only execution, retry/backoff, audit (webhook_runs), and per-user rate limit - Skills config (sre-on-call, blameless-postmortem, security-incident) injected into LLM system prompt - LLM provider failover chain (config/models.yml fallback + LLM_FALLBACK_CHAIN override) - Context-aware webhooks panel + backend id-mention safety net - Per-message stats (time/duration/tokens/model), Markdown+GFM render, code & table copy/download buttons - Vitest suite, end-to-end smoke test (scripts/smoke.mjs), per-session system prompt override - /metrics Prometheus endpoint + /api/metrics JSON, request-id correlation - dotenv with explicit repo-root path; envString/envNumber helpers (handles empty-string env) - Runbooks + SOPs under knowledge/ in English; README, docs, and INDEX.md in English
2026-06-29 16:20:53 +02:00
commit 62728b2200
89 changed files with 11992 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,273 @@
+# SIC — Super Incident Commander
+
+Lightweight web harness to use a centralized `pi.dev` engine from the browser, with independent sessions, reliable history in SQLite, internal Markdown documentation, and webhooks executed only from the backend after explicit user confirmation.
+
+## MVP scope
+
+- Expected ceiling: 5 concurrent users.
+- Frontend: React + Vite.
+- Backend: Node.js + Fastify.
+- Initial persistence: SQLite.
+- LLM: OpenAI-compatible endpoint via `pi-adapter`.
+- Default LLM provider: MiniMax OpenAI-compatible.
+- Configuration: YAML + environment variables.
+- Initial deploy: Docker Compose.
+
+## Reliability principle
+
+Nothing critical lives only in memory. Sessions, messages, and webhook audit are rebuilt from SQLite.
+
+Every conversation read/write must respect:
+
+```sql
+WHERE session_id = ?
+AND user_id = ?
+```
+
+## Structure
+
+```text
+apps/
+  api/                  # Fastify API, SSE, sessions, webhooks, docs
+  web/                  # React + Vite UI
+packages/
+  shared/               # Shared types
+  pi-adapter/           # pi.dev / OpenAI-compatible adapter
+config/                 # YAML for models, webhooks and docs
+knowledge/              # Internal Markdown documentation
+deploy/                 # Docker Compose and future manifests
+docs/                   # Definitions, reliable history and agents
+scripts/                # End-to-end smoke test + mock LLM
+```
+
+## API surface
+
+- `GET /healthz`
+- `GET /readyz`
+- `GET /api/version`
+- `GET /api/me`
+- `GET /api/sessions`
+- `POST /api/sessions`
+- `GET /api/sessions/:id`
+- `PATCH /api/sessions/:id`
+- `DELETE /api/sessions/:id`
+- `GET /api/docs/search?q=vpn`
+- `GET /api/docs/:id`
+- `GET /api/models`
+- `GET /api/webhooks`
+- `GET /api/webhook-runs?sessionId=...`
+- `POST /api/webhooks/:id/run`
+- `GET /api/skills`
+- `PATCH /api/sessions/:id/system-prompt` — set per-session context
+- `GET /metrics` — Prometheus text
+- `GET /api/metrics` — same as JSON
+- `POST /api/chat/stream`
+
+## Chat stream contract
+
+`POST /api/chat/stream` takes `sessionId`, `message` and optionally `model`.
+
+Reliability rules:
+
+1. Validate that the session belongs to the current user.
+2. Persist the `user` message before calling the LLM.
+3. If the session has no title yet, derive a short one from the first message.
+4. Validate the requested model against `config/models.yml`.
+5. Search relevant Markdown docs and role-allowed webhooks.
+6. Call the OpenAI-compatible endpoint via `pi-adapter`. If the model has a fallback chain, the chat route walks it on structured or transport errors; the first `ok=true` response wins.
+7. Emit SSE events: `docs`, `token`, `actions`, `done`.
+8. Persist the `assistant` response; if every model in the chain fails, persist a controlled message with error metadata and the full failure trail.
+
+### Provider fallback
+
+Each model in `config/models.yml` can declare `fallback: [other-id, ...]`. The chat route walks the chain when a model returns `ok=false` (no_content / json_parse / schema) or throws (5xx / 429 / network / timeout). When the assistant metadata is persisted, it includes `requested_model`, `fallback_attempts`, `fallback_chain`, and `fallback_failures` whenever the chain was actually used, so you can see what happened in the chat history.
+
+Override the chain globally with `LLM_FALLBACK_CHAIN` (comma-separated ids, first entry after the requested model). Leave empty to use each model's YAML chain.
+
+Default chain today (from `config/models.yml`):
+
+- `fast` → no fallback (it IS the cheap path)
+- `balanced` → `mr-auto`
+- `reasoning` → no fallback
+- `mr-auto` → no fallback
+
+## MiniMax
+
+The project is wired to MiniMax via the official OpenAI-compatible endpoint:
+
+- Base URL: `https://api.minimax.io/v1`
+- Chat path used by the adapter: `/chat/completions`
+- Auth: `Authorization: Bearer <key>`
+
+Models configured in `config/models.yml`:
+
+- `fast` → `MiniMax-M2.7-highspeed`
+- `balanced` → `MiniMax-M2.7`
+- `reasoning` → `MiniMax-M3`
+
+To run locally, set the key:
+
+```bash
+export MINIMAX_API_KEY="your-key"
+export LLM_BASE_URL="https://api.minimax.io/v1"
+export LLM_API_KEY="$MINIMAX_API_KEY"
+export DEFAULT_MODEL="fast"
+```
+
+In Docker Compose you only need to export `MINIMAX_API_KEY`; the compose maps it to `LLM_API_KEY`.
+
+## UI MVP
+
+The React app already consumes the API through the Vite proxy:
+
+- Loads or creates a local session.
+- Loads `GET /api/models` and lets the user pick the model per message.
+- Lists persisted sessions and lets the user switch between them.
+- Lets the user rename and delete sessions, always through the API with per-user isolation.
+- Sends messages to `POST /api/chat/stream` and consumes SSE events.
+- Shows recommended documentation and lets the user open the full document via `GET /api/docs/:id`.
+- Shows suggested actions in the right panel.
+- Loads `GET /api/webhooks` to show public labels/descriptions for actions.
+- Executes webhooks only after user confirmation and always through the backend.
+- Shows execution audit per session from `GET /api/webhook-runs`, without exposing URLs or payload templates.
+- Can attach a development Bearer token to test `AUTH_MODE=keycloak`; reads from `localStorage` or `VITE_AUTH_TOKEN`.
+
+## Skills
+
+Skills are persona/behavior prompt fragments loaded from `config/skills.yml` and injected into the LLM's system prompt at chat time. They are NOT capabilities: the model still only recommends actions and the backend still owns execution.
+
+Each skill has: `id`, `name`, `description`, `enabled`, `prompt`. Skills with `enabled: true` are injected into the chat system prompt (after the base identity prompt, before the docs/actions context). Skills with `enabled: false` are kept in the file but inactive. The frontend can list them via `GET /api/skills` (no prompt text is exposed publicly — only id, name, description, enabled).
+
+Edit `config/skills.yml` and restart the API to change the active skill set. The default file ships with `sre-on-call` and `blameless-postmortem` enabled; `security-incident` is shipped disabled as a reference.
+
+The env var `SKILLS_CONFIG_PATH` overrides the default config path (`../../config/skills.yml` relative to `cwd`).
+
+## Per-session context
+
+Every session has an optional `system_prompt` field. When set, it is prepended to every chat turn as a system message (after the base identity prompt and skill prompts, before the docs/actions context). Use it to pin incident id, on-call name, or runbook references that shouldn't drift across the conversation.
+
+- **Frontend**: each session row has a small circle button (`○` empty, `●` set). Click it to open a modal editor with Save and Clear.
+- **API**: `PATCH /api/sessions/:id/system-prompt` with `{ "system_prompt": "..." }`. Send `null` or empty string to clear.
+- **Limit**: 8000 characters.
+- **Persistence**: stored in `chat_sessions.system_prompt`; same `WHERE id = ? AND user_id = ?` ownership rule as every other session operation.
+
+## Observability
+
+Two endpoints surface API metrics:
+
+- `GET /metrics` — Prometheus text exposition (counter / summary), scraper-friendly. Default Prometheus port / scrape target.
+- `GET /api/metrics` — same data as JSON for humans and the smoke test. Shape:
+
+  ```json
+  {
+    "started_at": "2026-06-29T12:00:00.000Z",
+    "uptime_seconds": 1234,
+    "totals": { "requests": 5678, "errors_5xx": 0 },
+    "routes": [
+      {
+        "route": "/api/chat/stream",
+        "method": "POST",
+        "count": 42,
+        "avg_ms": 1230,
+        "p95_ms": 4500,
+        "max_ms": 8000,
+        "status_buckets": { "200_299": 42 }
+      }
+    ],
+    "recent": [
+      {
+        "route": "/api/sessions/:id",
+        "method": "DELETE",
+        "status": 204,
+        "durationMs": 4,
+        "timestamp": 1782727300000
+      }
+    ]
+  }
+  ```
+
+Routes are aggregated by route **template** (e.g. `/api/sessions/:id`), not by raw URL, so `/api/sessions/abc` and `/api/sessions/def` share a bucket. p95 uses a fixed-size streaming reservoir (200 samples) so memory stays bounded under traffic. In-memory only — counters reset on restart; that's the expected behavior for a 5-user MVP.
+
+## Auth
+
+The backend supports two modes:
+
+- `AUTH_MODE=local`: dev mode, uses `local-user` with roles `admin` and `webhook-runner`.
+- `AUTH_MODE=keycloak`: validates `Authorization: Bearer <token>` with remote JWKS from `OIDC_ISSUER` and `OIDC_AUDIENCE`.
+
+For manual Keycloak testing, the UI lets you paste a JWT in the "Dev token" box. That token is stored in `localStorage` and sent as `Authorization: Bearer <token>` on API and stream calls. Alternatively, Vite can receive `VITE_AUTH_TOKEN` to preconfigure it for the local environment.
+
+Claims used from Keycloak:
+
+- `sub` as `user.id`.
+- `preferred_username` and `email` for display.
+- Roles from `realm_access.roles` and `resource_access[OIDC_AUDIENCE].roles`.
+
+## Basic hardening
+
+- `API_BODY_LIMIT_BYTES`: global Fastify body limit. Default: `1048576`.
+- `CHAT_MESSAGE_MAX_CHARS`: chat message and `lastUserMessage` limit on webhooks. Default: `8000`.
+- `CORS_ALLOWED_ORIGINS`: comma-separated list. If unset, open for dev.
+- `LLM_TIMEOUT_MS`: OpenAI-compatible call timeout. Default: `30000`.
+- `WEBHOOK_TIMEOUT_MS`: backend-only webhook execution timeout. Default: `15000`.
+- `WEBHOOK_RETRY_MAX_ATTEMPTS`: retries per webhook on transient errors (5xx, 429, timeout, network). Default: `3`.
+- `WEBHOOK_RETRY_INITIAL_BACKOFF_MS`: initial backoff with exponential growth. Default: `500`.
+- `WEBHOOK_RETRY_MAX_BACKOFF_MS`: backoff cap. Default: `5000`.
+- `WEBHOOK_RUNS_RETENTION_DAYS`: age cutoff for `webhook_runs` rows. Runs older than this are purged on boot and on a timer. Default: `30`. Set to `0` to disable the age pass.
+- `WEBHOOK_RUNS_MAX_PER_USER`: keep at most this many most-recent runs per user. The oldest overflow is purged. Default: `1000`. Set to `0` to disable the cap pass.
+- `WEBHOOK_AUDIT_PURGE_INTERVAL_MS`: how often the janitor runs while the API is up. Default: `3600000` (1 hour). Minimum: `60000` (1 minute).
+- `CHAT_RATE_LIMIT_PER_MINUTE`: per-user rate limit on `POST /api/chat/stream` (token-bucket refill rate). Default: `20`.
+- `CHAT_RATE_LIMIT_BURST`: per-user burst size. Default: `5`. Rejected calls return `429` with `retry-after` in seconds and `x-ratelimit-remaining: 0`.
+- The API adds basic defensive headers: `x-content-type-options`, `referrer-policy`, `x-frame-options`.
+
+## End-to-end smoke test
+
+A smoke script exercises the full API (health, auth, models, docs, webhooks, sessions, SSE stream, message persistence and audit).
+
+### With a real LLM (MiniMax)
+
+```bash
+# Terminal 1: start the API and the web
+export LLM_BASE_URL=https://api.minimax.io/v1
+export LLM_API_KEY="$MINIMAX_API_KEY"
+export DEFAULT_MODEL=fast
+pnpm dev
+
+# Terminal 2: smoke test against http://localhost:3000
+pnpm smoke
+```
+
+### With the mock LLM (no key)
+
+```bash
+# Terminal 1: start the API and the web pointing at the mock
+pnpm mock:llm &
+export LLM_BASE_URL=http://127.0.0.1:4010/v1
+export LLM_API_KEY=dummy
+export DEFAULT_MODEL=fast
+pnpm dev
+
+# Terminal 2
+pnpm smoke
+
+# or in a single step, the script starts the mock internally:
+pnpm smoke:mock
+```
+
+Steps covered (in order):
+
+1. `/healthz`, `/readyz`
+2. `/api/me` (local auth)
+3. `/api/models`
+4. `/api/docs/search` + `/api/docs/:id`
+5. `/api/webhooks`
+6. `POST /api/sessions` + `GET /api/sessions`
+7. `POST /api/chat/stream` and SSE event parsing (`docs`, `token`, `actions`, `done`)
+8. `GET /api/sessions/:id` to confirm the assistant message was persisted
+9. `GET /api/webhook-runs?sessionId=...` to confirm audit listing
+10. `DELETE /api/sessions/:id` (cleanup)
+
+Optional flags:
+
+- `pnpm smoke --api-base http://localhost:4000` to point at a different API
+- `pnpm smoke:mock` (alias of `pnpm smoke --mock-llm`) starts the mock inside the script