Files
sic/README.md
rikrdo 62728b2200 Initial commit: SIC harness (backend, web, pi-adapter, configs, docs)
- pnpm monorepo: apps/api (Fastify + SQLite + SSE), apps/web (React+Vite), packages/shared, packages/pi-adapter
- Local auth (admin/webhook-runner roles) + Keycloak JWT ready
- Multi-session chat with reliable history (user persisted before LLM, assistant persisted after stream)
- Markdown knowledge base with /api/docs/search + /api/docs/:id
- YAML webhook catalog with backend-only execution, retry/backoff, audit (webhook_runs), and per-user rate limit
- Skills config (sre-on-call, blameless-postmortem, security-incident) injected into LLM system prompt
- LLM provider failover chain (config/models.yml fallback + LLM_FALLBACK_CHAIN override)
- Context-aware webhooks panel + backend id-mention safety net
- Per-message stats (time/duration/tokens/model), Markdown+GFM render, code & table copy/download buttons
- Vitest suite, end-to-end smoke test (scripts/smoke.mjs), per-session system prompt override
- /metrics Prometheus endpoint + /api/metrics JSON, request-id correlation
- dotenv with explicit repo-root path; envString/envNumber helpers (handles empty-string env)
- Runbooks + SOPs under knowledge/ in English; README, docs, and INDEX.md in English
2026-06-29 16:20:53 +02:00

274 lines
12 KiB
Markdown

# SIC — Super Incident Commander
Lightweight web harness to use a centralized `pi.dev` engine from the browser, with independent sessions, reliable history in SQLite, internal Markdown documentation, and webhooks executed only from the backend after explicit user confirmation.
## MVP scope
- Expected ceiling: 5 concurrent users.
- Frontend: React + Vite.
- Backend: Node.js + Fastify.
- Initial persistence: SQLite.
- LLM: OpenAI-compatible endpoint via `pi-adapter`.
- Default LLM provider: MiniMax OpenAI-compatible.
- Configuration: YAML + environment variables.
- Initial deploy: Docker Compose.
## Reliability principle
Nothing critical lives only in memory. Sessions, messages, and webhook audit are rebuilt from SQLite.
Every conversation read/write must respect:
```sql
WHERE session_id = ?
AND user_id = ?
```
## Structure
```text
apps/
api/ # Fastify API, SSE, sessions, webhooks, docs
web/ # React + Vite UI
packages/
shared/ # Shared types
pi-adapter/ # pi.dev / OpenAI-compatible adapter
config/ # YAML for models, webhooks and docs
knowledge/ # Internal Markdown documentation
deploy/ # Docker Compose and future manifests
docs/ # Definitions, reliable history and agents
scripts/ # End-to-end smoke test + mock LLM
```
## API surface
- `GET /healthz`
- `GET /readyz`
- `GET /api/version`
- `GET /api/me`
- `GET /api/sessions`
- `POST /api/sessions`
- `GET /api/sessions/:id`
- `PATCH /api/sessions/:id`
- `DELETE /api/sessions/:id`
- `GET /api/docs/search?q=vpn`
- `GET /api/docs/:id`
- `GET /api/models`
- `GET /api/webhooks`
- `GET /api/webhook-runs?sessionId=...`
- `POST /api/webhooks/:id/run`
- `GET /api/skills`
- `PATCH /api/sessions/:id/system-prompt` — set per-session context
- `GET /metrics` — Prometheus text
- `GET /api/metrics` — same as JSON
- `POST /api/chat/stream`
## Chat stream contract
`POST /api/chat/stream` takes `sessionId`, `message` and optionally `model`.
Reliability rules:
1. Validate that the session belongs to the current user.
2. Persist the `user` message before calling the LLM.
3. If the session has no title yet, derive a short one from the first message.
4. Validate the requested model against `config/models.yml`.
5. Search relevant Markdown docs and role-allowed webhooks.
6. Call the OpenAI-compatible endpoint via `pi-adapter`. If the model has a fallback chain, the chat route walks it on structured or transport errors; the first `ok=true` response wins.
7. Emit SSE events: `docs`, `token`, `actions`, `done`.
8. Persist the `assistant` response; if every model in the chain fails, persist a controlled message with error metadata and the full failure trail.
### Provider fallback
Each model in `config/models.yml` can declare `fallback: [other-id, ...]`. The chat route walks the chain when a model returns `ok=false` (no_content / json_parse / schema) or throws (5xx / 429 / network / timeout). When the assistant metadata is persisted, it includes `requested_model`, `fallback_attempts`, `fallback_chain`, and `fallback_failures` whenever the chain was actually used, so you can see what happened in the chat history.
Override the chain globally with `LLM_FALLBACK_CHAIN` (comma-separated ids, first entry after the requested model). Leave empty to use each model's YAML chain.
Default chain today (from `config/models.yml`):
- `fast` → no fallback (it IS the cheap path)
- `balanced``mr-auto`
- `reasoning` → no fallback
- `mr-auto` → no fallback
## MiniMax
The project is wired to MiniMax via the official OpenAI-compatible endpoint:
- Base URL: `https://api.minimax.io/v1`
- Chat path used by the adapter: `/chat/completions`
- Auth: `Authorization: Bearer <key>`
Models configured in `config/models.yml`:
- `fast``MiniMax-M2.7-highspeed`
- `balanced``MiniMax-M2.7`
- `reasoning``MiniMax-M3`
To run locally, set the key:
```bash
export MINIMAX_API_KEY="your-key"
export LLM_BASE_URL="https://api.minimax.io/v1"
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL="fast"
```
In Docker Compose you only need to export `MINIMAX_API_KEY`; the compose maps it to `LLM_API_KEY`.
## UI MVP
The React app already consumes the API through the Vite proxy:
- Loads or creates a local session.
- Loads `GET /api/models` and lets the user pick the model per message.
- Lists persisted sessions and lets the user switch between them.
- Lets the user rename and delete sessions, always through the API with per-user isolation.
- Sends messages to `POST /api/chat/stream` and consumes SSE events.
- Shows recommended documentation and lets the user open the full document via `GET /api/docs/:id`.
- Shows suggested actions in the right panel.
- Loads `GET /api/webhooks` to show public labels/descriptions for actions.
- Executes webhooks only after user confirmation and always through the backend.
- Shows execution audit per session from `GET /api/webhook-runs`, without exposing URLs or payload templates.
- Can attach a development Bearer token to test `AUTH_MODE=keycloak`; reads from `localStorage` or `VITE_AUTH_TOKEN`.
## Skills
Skills are persona/behavior prompt fragments loaded from `config/skills.yml` and injected into the LLM's system prompt at chat time. They are NOT capabilities: the model still only recommends actions and the backend still owns execution.
Each skill has: `id`, `name`, `description`, `enabled`, `prompt`. Skills with `enabled: true` are injected into the chat system prompt (after the base identity prompt, before the docs/actions context). Skills with `enabled: false` are kept in the file but inactive. The frontend can list them via `GET /api/skills` (no prompt text is exposed publicly — only id, name, description, enabled).
Edit `config/skills.yml` and restart the API to change the active skill set. The default file ships with `sre-on-call` and `blameless-postmortem` enabled; `security-incident` is shipped disabled as a reference.
The env var `SKILLS_CONFIG_PATH` overrides the default config path (`../../config/skills.yml` relative to `cwd`).
## Per-session context
Every session has an optional `system_prompt` field. When set, it is prepended to every chat turn as a system message (after the base identity prompt and skill prompts, before the docs/actions context). Use it to pin incident id, on-call name, or runbook references that shouldn't drift across the conversation.
- **Frontend**: each session row has a small circle button (`○` empty, `●` set). Click it to open a modal editor with Save and Clear.
- **API**: `PATCH /api/sessions/:id/system-prompt` with `{ "system_prompt": "..." }`. Send `null` or empty string to clear.
- **Limit**: 8000 characters.
- **Persistence**: stored in `chat_sessions.system_prompt`; same `WHERE id = ? AND user_id = ?` ownership rule as every other session operation.
## Observability
Two endpoints surface API metrics:
- `GET /metrics` — Prometheus text exposition (counter / summary), scraper-friendly. Default Prometheus port / scrape target.
- `GET /api/metrics` — same data as JSON for humans and the smoke test. Shape:
```json
{
"started_at": "2026-06-29T12:00:00.000Z",
"uptime_seconds": 1234,
"totals": { "requests": 5678, "errors_5xx": 0 },
"routes": [
{
"route": "/api/chat/stream",
"method": "POST",
"count": 42,
"avg_ms": 1230,
"p95_ms": 4500,
"max_ms": 8000,
"status_buckets": { "200_299": 42 }
}
],
"recent": [
{
"route": "/api/sessions/:id",
"method": "DELETE",
"status": 204,
"durationMs": 4,
"timestamp": 1782727300000
}
]
}
```
Routes are aggregated by route **template** (e.g. `/api/sessions/:id`), not by raw URL, so `/api/sessions/abc` and `/api/sessions/def` share a bucket. p95 uses a fixed-size streaming reservoir (200 samples) so memory stays bounded under traffic. In-memory only — counters reset on restart; that's the expected behavior for a 5-user MVP.
## Auth
The backend supports two modes:
- `AUTH_MODE=local`: dev mode, uses `local-user` with roles `admin` and `webhook-runner`.
- `AUTH_MODE=keycloak`: validates `Authorization: Bearer <token>` with remote JWKS from `OIDC_ISSUER` and `OIDC_AUDIENCE`.
For manual Keycloak testing, the UI lets you paste a JWT in the "Dev token" box. That token is stored in `localStorage` and sent as `Authorization: Bearer <token>` on API and stream calls. Alternatively, Vite can receive `VITE_AUTH_TOKEN` to preconfigure it for the local environment.
Claims used from Keycloak:
- `sub` as `user.id`.
- `preferred_username` and `email` for display.
- Roles from `realm_access.roles` and `resource_access[OIDC_AUDIENCE].roles`.
## Basic hardening
- `API_BODY_LIMIT_BYTES`: global Fastify body limit. Default: `1048576`.
- `CHAT_MESSAGE_MAX_CHARS`: chat message and `lastUserMessage` limit on webhooks. Default: `8000`.
- `CORS_ALLOWED_ORIGINS`: comma-separated list. If unset, open for dev.
- `LLM_TIMEOUT_MS`: OpenAI-compatible call timeout. Default: `30000`.
- `WEBHOOK_TIMEOUT_MS`: backend-only webhook execution timeout. Default: `15000`.
- `WEBHOOK_RETRY_MAX_ATTEMPTS`: retries per webhook on transient errors (5xx, 429, timeout, network). Default: `3`.
- `WEBHOOK_RETRY_INITIAL_BACKOFF_MS`: initial backoff with exponential growth. Default: `500`.
- `WEBHOOK_RETRY_MAX_BACKOFF_MS`: backoff cap. Default: `5000`.
- `WEBHOOK_RUNS_RETENTION_DAYS`: age cutoff for `webhook_runs` rows. Runs older than this are purged on boot and on a timer. Default: `30`. Set to `0` to disable the age pass.
- `WEBHOOK_RUNS_MAX_PER_USER`: keep at most this many most-recent runs per user. The oldest overflow is purged. Default: `1000`. Set to `0` to disable the cap pass.
- `WEBHOOK_AUDIT_PURGE_INTERVAL_MS`: how often the janitor runs while the API is up. Default: `3600000` (1 hour). Minimum: `60000` (1 minute).
- `CHAT_RATE_LIMIT_PER_MINUTE`: per-user rate limit on `POST /api/chat/stream` (token-bucket refill rate). Default: `20`.
- `CHAT_RATE_LIMIT_BURST`: per-user burst size. Default: `5`. Rejected calls return `429` with `retry-after` in seconds and `x-ratelimit-remaining: 0`.
- The API adds basic defensive headers: `x-content-type-options`, `referrer-policy`, `x-frame-options`.
## End-to-end smoke test
A smoke script exercises the full API (health, auth, models, docs, webhooks, sessions, SSE stream, message persistence and audit).
### With a real LLM (MiniMax)
```bash
# Terminal 1: start the API and the web
export LLM_BASE_URL=https://api.minimax.io/v1
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL=fast
pnpm dev
# Terminal 2: smoke test against http://localhost:3000
pnpm smoke
```
### With the mock LLM (no key)
```bash
# Terminal 1: start the API and the web pointing at the mock
pnpm mock:llm &
export LLM_BASE_URL=http://127.0.0.1:4010/v1
export LLM_API_KEY=dummy
export DEFAULT_MODEL=fast
pnpm dev
# Terminal 2
pnpm smoke
# or in a single step, the script starts the mock internally:
pnpm smoke:mock
```
Steps covered (in order):
1. `/healthz`, `/readyz`
2. `/api/me` (local auth)
3. `/api/models`
4. `/api/docs/search` + `/api/docs/:id`
5. `/api/webhooks`
6. `POST /api/sessions` + `GET /api/sessions`
7. `POST /api/chat/stream` and SSE event parsing (`docs`, `token`, `actions`, `done`)
8. `GET /api/sessions/:id` to confirm the assistant message was persisted
9. `GET /api/webhook-runs?sessionId=...` to confirm audit listing
10. `DELETE /api/sessions/:id` (cleanup)
Optional flags:
- `pnpm smoke --api-base http://localhost:4000` to point at a different API
- `pnpm smoke:mock` (alias of `pnpm smoke --mock-llm`) starts the mock inside the script