# SIC — Super Incident Commander Lightweight web harness to use a centralized `pi.dev` engine from the browser, with independent sessions, reliable history in SQLite, internal Markdown documentation, and webhooks executed only from the backend after explicit user confirmation. ## MVP scope - Expected ceiling: 5 concurrent users. - Frontend: React + Vite. - Backend: Node.js + Fastify. - Initial persistence: SQLite. - LLM: OpenAI-compatible endpoint via `pi-adapter`. - Default LLM provider: MiniMax OpenAI-compatible. - Configuration: YAML + environment variables. - Initial deploy: Docker Compose. ## Reliability principle Nothing critical lives only in memory. Sessions, messages, and webhook audit are rebuilt from SQLite. Every conversation read/write must respect: ```sql WHERE session_id = ? AND user_id = ? ``` ## Structure ```text apps/ api/ # Fastify API, SSE, sessions, webhooks, docs web/ # React + Vite UI packages/ shared/ # Shared types pi-adapter/ # pi.dev / OpenAI-compatible adapter config/ # YAML for models, webhooks and docs knowledge/ # Internal Markdown documentation deploy/ # Docker Compose and future manifests docs/ # Definitions, reliable history and agents scripts/ # End-to-end smoke test + mock LLM ``` ## API surface - `GET /healthz` - `GET /readyz` - `GET /api/version` - `GET /api/me` - `GET /api/sessions` - `POST /api/sessions` - `GET /api/sessions/:id` - `PATCH /api/sessions/:id` - `DELETE /api/sessions/:id` - `GET /api/docs/search?q=vpn` - `GET /api/docs/:id` - `GET /api/models` - `GET /api/webhooks` - `GET /api/webhook-runs?sessionId=...` - `POST /api/webhooks/:id/run` - `GET /api/skills` - `PATCH /api/sessions/:id/system-prompt` — set per-session context - `GET /metrics` — Prometheus text - `GET /api/metrics` — same as JSON - `POST /api/chat/stream` ## Chat stream contract `POST /api/chat/stream` takes `sessionId`, `message` and optionally `model`. Reliability rules: 1. Validate that the session belongs to the current user. 2. Persist the `user` message before calling the LLM. 3. If the session has no title yet, derive a short one from the first message. 4. Validate the requested model against `config/models.yml`. 5. Search relevant Markdown docs and role-allowed webhooks. 6. Call the OpenAI-compatible endpoint via `pi-adapter`. If the model has a fallback chain, the chat route walks it on structured or transport errors; the first `ok=true` response wins. 7. Emit SSE events: `docs`, `token`, `actions`, `done`. 8. Persist the `assistant` response; if every model in the chain fails, persist a controlled message with error metadata and the full failure trail. ### Provider fallback Each model in `config/models.yml` can declare `fallback: [other-id, ...]`. The chat route walks the chain when a model returns `ok=false` (no_content / json_parse / schema) or throws (5xx / 429 / network / timeout). When the assistant metadata is persisted, it includes `requested_model`, `fallback_attempts`, `fallback_chain`, and `fallback_failures` whenever the chain was actually used, so you can see what happened in the chat history. Override the chain globally with `LLM_FALLBACK_CHAIN` (comma-separated ids, first entry after the requested model). Leave empty to use each model's YAML chain. Default chain today (from `config/models.yml`): - `fast` → no fallback (it IS the cheap path) - `balanced` → `mr-auto` - `reasoning` → no fallback - `mr-auto` → no fallback ## MiniMax The project is wired to MiniMax via the official OpenAI-compatible endpoint: - Base URL: `https://api.minimax.io/v1` - Chat path used by the adapter: `/chat/completions` - Auth: `Authorization: Bearer ` Models configured in `config/models.yml`: - `fast` → `MiniMax-M2.7-highspeed` - `balanced` → `MiniMax-M2.7` - `reasoning` → `MiniMax-M3` To run locally, set the key: ```bash export MINIMAX_API_KEY="your-key" export LLM_BASE_URL="https://api.minimax.io/v1" export LLM_API_KEY="$MINIMAX_API_KEY" export DEFAULT_MODEL="fast" ``` In Docker Compose you only need to export `MINIMAX_API_KEY`; the compose maps it to `LLM_API_KEY`. ## UI MVP The React app already consumes the API through the Vite proxy: - Loads or creates a local session. - Loads `GET /api/models` and lets the user pick the model per message. - Lists persisted sessions and lets the user switch between them. - Lets the user rename and delete sessions, always through the API with per-user isolation. - Sends messages to `POST /api/chat/stream` and consumes SSE events. - Shows recommended documentation and lets the user open the full document via `GET /api/docs/:id`. - Shows suggested actions in the right panel. - Loads `GET /api/webhooks` to show public labels/descriptions for actions. - Executes webhooks only after user confirmation and always through the backend. - Shows execution audit per session from `GET /api/webhook-runs`, without exposing URLs or payload templates. - Can attach a development Bearer token to test `AUTH_MODE=keycloak`; reads from `localStorage` or `VITE_AUTH_TOKEN`. ## Skills Skills are persona/behavior prompt fragments loaded from `config/skills.yml` and injected into the LLM's system prompt at chat time. They are NOT capabilities: the model still only recommends actions and the backend still owns execution. Each skill has: `id`, `name`, `description`, `enabled`, `prompt`. Skills with `enabled: true` are injected into the chat system prompt (after the base identity prompt, before the docs/actions context). Skills with `enabled: false` are kept in the file but inactive. The frontend can list them via `GET /api/skills` (no prompt text is exposed publicly — only id, name, description, enabled). Edit `config/skills.yml` and restart the API to change the active skill set. The default file ships with `sre-on-call` and `blameless-postmortem` enabled; `security-incident` is shipped disabled as a reference. The env var `SKILLS_CONFIG_PATH` overrides the default config path (`../../config/skills.yml` relative to `cwd`). ## Per-session context Every session has an optional `system_prompt` field. When set, it is prepended to every chat turn as a system message (after the base identity prompt and skill prompts, before the docs/actions context). Use it to pin incident id, on-call name, or runbook references that shouldn't drift across the conversation. - **Frontend**: each session row has a small circle button (`○` empty, `●` set). Click it to open a modal editor with Save and Clear. - **API**: `PATCH /api/sessions/:id/system-prompt` with `{ "system_prompt": "..." }`. Send `null` or empty string to clear. - **Limit**: 8000 characters. - **Persistence**: stored in `chat_sessions.system_prompt`; same `WHERE id = ? AND user_id = ?` ownership rule as every other session operation. ## Observability Two endpoints surface API metrics: - `GET /metrics` — Prometheus text exposition (counter / summary), scraper-friendly. Default Prometheus port / scrape target. - `GET /api/metrics` — same data as JSON for humans and the smoke test. Shape: ```json { "started_at": "2026-06-29T12:00:00.000Z", "uptime_seconds": 1234, "totals": { "requests": 5678, "errors_5xx": 0 }, "routes": [ { "route": "/api/chat/stream", "method": "POST", "count": 42, "avg_ms": 1230, "p95_ms": 4500, "max_ms": 8000, "status_buckets": { "200_299": 42 } } ], "recent": [ { "route": "/api/sessions/:id", "method": "DELETE", "status": 204, "durationMs": 4, "timestamp": 1782727300000 } ] } ``` Routes are aggregated by route **template** (e.g. `/api/sessions/:id`), not by raw URL, so `/api/sessions/abc` and `/api/sessions/def` share a bucket. p95 uses a fixed-size streaming reservoir (200 samples) so memory stays bounded under traffic. In-memory only — counters reset on restart; that's the expected behavior for a 5-user MVP. ## Auth The backend supports two modes: - `AUTH_MODE=local`: dev mode, uses `local-user` with roles `admin` and `webhook-runner`. - `AUTH_MODE=keycloak`: validates `Authorization: Bearer ` with remote JWKS from `OIDC_ISSUER` and `OIDC_AUDIENCE`. For manual Keycloak testing, the UI lets you paste a JWT in the "Dev token" box. That token is stored in `localStorage` and sent as `Authorization: Bearer ` on API and stream calls. Alternatively, Vite can receive `VITE_AUTH_TOKEN` to preconfigure it for the local environment. Claims used from Keycloak: - `sub` as `user.id`. - `preferred_username` and `email` for display. - Roles from `realm_access.roles` and `resource_access[OIDC_AUDIENCE].roles`. ## Basic hardening - `API_BODY_LIMIT_BYTES`: global Fastify body limit. Default: `1048576`. - `CHAT_MESSAGE_MAX_CHARS`: chat message and `lastUserMessage` limit on webhooks. Default: `8000`. - `CORS_ALLOWED_ORIGINS`: comma-separated list. If unset, open for dev. - `LLM_TIMEOUT_MS`: OpenAI-compatible call timeout. Default: `30000`. - `WEBHOOK_TIMEOUT_MS`: backend-only webhook execution timeout. Default: `15000`. - `WEBHOOK_RETRY_MAX_ATTEMPTS`: retries per webhook on transient errors (5xx, 429, timeout, network). Default: `3`. - `WEBHOOK_RETRY_INITIAL_BACKOFF_MS`: initial backoff with exponential growth. Default: `500`. - `WEBHOOK_RETRY_MAX_BACKOFF_MS`: backoff cap. Default: `5000`. - `WEBHOOK_RUNS_RETENTION_DAYS`: age cutoff for `webhook_runs` rows. Runs older than this are purged on boot and on a timer. Default: `30`. Set to `0` to disable the age pass. - `WEBHOOK_RUNS_MAX_PER_USER`: keep at most this many most-recent runs per user. The oldest overflow is purged. Default: `1000`. Set to `0` to disable the cap pass. - `WEBHOOK_AUDIT_PURGE_INTERVAL_MS`: how often the janitor runs while the API is up. Default: `3600000` (1 hour). Minimum: `60000` (1 minute). - `CHAT_RATE_LIMIT_PER_MINUTE`: per-user rate limit on `POST /api/chat/stream` (token-bucket refill rate). Default: `20`. - `CHAT_RATE_LIMIT_BURST`: per-user burst size. Default: `5`. Rejected calls return `429` with `retry-after` in seconds and `x-ratelimit-remaining: 0`. - The API adds basic defensive headers: `x-content-type-options`, `referrer-policy`, `x-frame-options`. ## End-to-end smoke test A smoke script exercises the full API (health, auth, models, docs, webhooks, sessions, SSE stream, message persistence and audit). ### With a real LLM (MiniMax) ```bash # Terminal 1: start the API and the web export LLM_BASE_URL=https://api.minimax.io/v1 export LLM_API_KEY="$MINIMAX_API_KEY" export DEFAULT_MODEL=fast pnpm dev # Terminal 2: smoke test against http://localhost:3000 pnpm smoke ``` ### With the mock LLM (no key) ```bash # Terminal 1: start the API and the web pointing at the mock pnpm mock:llm & export LLM_BASE_URL=http://127.0.0.1:4010/v1 export LLM_API_KEY=dummy export DEFAULT_MODEL=fast pnpm dev # Terminal 2 pnpm smoke # or in a single step, the script starts the mock internally: pnpm smoke:mock ``` Steps covered (in order): 1. `/healthz`, `/readyz` 2. `/api/me` (local auth) 3. `/api/models` 4. `/api/docs/search` + `/api/docs/:id` 5. `/api/webhooks` 6. `POST /api/sessions` + `GET /api/sessions` 7. `POST /api/chat/stream` and SSE event parsing (`docs`, `token`, `actions`, `done`) 8. `GET /api/sessions/:id` to confirm the assistant message was persisted 9. `GET /api/webhook-runs?sessionId=...` to confirm audit listing 10. `DELETE /api/sessions/:id` (cleanup) Optional flags: - `pnpm smoke --api-base http://localhost:4000` to point at a different API - `pnpm smoke:mock` (alias of `pnpm smoke --mock-llm`) starts the mock inside the script