sic/README.md

# SIC — Super Incident Commander

Lightweight web harness to use a centralized `pi.dev` engine from the browser, with independent sessions, reliable history in SQLite, internal Markdown documentation, and webhooks executed only from the backend after explicit user confirmation.

## MVP scope

- Expected ceiling: 5 concurrent users.
- Frontend: React + Vite.
- Backend: Node.js + Fastify.
- Initial persistence: SQLite.
- LLM: OpenAI-compatible endpoint via `pi-adapter`.
- Default LLM provider: MiniMax OpenAI-compatible.
- Configuration: YAML + environment variables.
- Initial deploy: Docker Compose.

## Reliability principle

Nothing critical lives only in memory. Sessions, messages, and webhook audit are rebuilt from SQLite.

Every conversation read/write must respect:

```sql
WHERE session_id = ?
AND user_id = ?
```

## Structure

```text
apps/
  api/                  # Fastify API, SSE, sessions, webhooks, docs
  web/                  # React + Vite UI
packages/
  shared/               # Shared types
  pi-adapter/           # pi.dev / OpenAI-compatible adapter
config/                 # YAML for models, webhooks and docs
knowledge/              # Internal Markdown documentation
deploy/                 # Docker Compose and future manifests
docs/                   # Definitions, reliable history and agents
scripts/                # End-to-end smoke test + mock LLM
```

## API surface

- `GET /healthz`
- `GET /readyz`
- `GET /api/version`
- `GET /api/me`
- `GET /api/sessions`
- `POST /api/sessions`
- `GET /api/sessions/:id`
- `PATCH /api/sessions/:id`
- `DELETE /api/sessions/:id`
- `GET /api/docs/search?q=vpn`
- `GET /api/docs/:id`
- `GET /api/models`
- `GET /api/webhooks`
- `GET /api/webhook-runs?sessionId=...`
- `POST /api/webhooks/:id/run`
- `GET /api/skills`
- `PATCH /api/sessions/:id/system-prompt` — set per-session context
- `GET /metrics` — Prometheus text
- `GET /api/metrics` — same as JSON
- `POST /api/chat/stream`

## Chat stream contract

`POST /api/chat/stream` takes `sessionId`, `message` and optionally `model`.

Reliability rules:

1. Validate that the session belongs to the current user.
2. Persist the `user` message before calling the LLM.
3. If the session has no title yet, derive a short one from the first message.
4. Validate the requested model against `config/models.yml`.
5. Search relevant Markdown docs and role-allowed webhooks.
6. Call the OpenAI-compatible endpoint via `pi-adapter`. If the model has a fallback chain, the chat route walks it on structured or transport errors; the first `ok=true` response wins.
7. Emit SSE events: `docs`, `token`, `actions`, `done`.
8. Persist the `assistant` response; if every model in the chain fails, persist a controlled message with error metadata and the full failure trail.

### Provider fallback

Each model in `config/models.yml` can declare `fallback: [other-id, ...]`. The chat route walks the chain when a model returns `ok=false` (no_content / json_parse / schema) or throws (5xx / 429 / network / timeout). When the assistant metadata is persisted, it includes `requested_model`, `fallback_attempts`, `fallback_chain`, and `fallback_failures` whenever the chain was actually used, so you can see what happened in the chat history.

Override the chain globally with `LLM_FALLBACK_CHAIN` (comma-separated ids, first entry after the requested model). Leave empty to use each model's YAML chain.

Default chain today (from `config/models.yml`):

- `fast` → no fallback (it IS the cheap path)
- `balanced` → `mr-auto`
- `reasoning` → no fallback
- `mr-auto` → no fallback

## MiniMax

The project is wired to MiniMax via the official OpenAI-compatible endpoint:

- Base URL: `https://api.minimax.io/v1`
- Chat path used by the adapter: `/chat/completions`
- Auth: `Authorization: Bearer <key>`

Models configured in `config/models.yml`:

- `fast` → `MiniMax-M2.7-highspeed`
- `balanced` → `MiniMax-M2.7`
- `reasoning` → `MiniMax-M3`

To run locally, set the key:

```bash
export MINIMAX_API_KEY="your-key"
export LLM_BASE_URL="https://api.minimax.io/v1"
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL="fast"
```

In Docker Compose you only need to export `MINIMAX_API_KEY`; the compose maps it to `LLM_API_KEY`.

## UI MVP

The React app already consumes the API through the Vite proxy:

- Loads or creates a local session.
- Loads `GET /api/models` and lets the user pick the model per message.
- Lists persisted sessions and lets the user switch between them.
- Lets the user rename and delete sessions, always through the API with per-user isolation.
- Sends messages to `POST /api/chat/stream` and consumes SSE events.
- Shows recommended documentation and lets the user open the full document via `GET /api/docs/:id`.
- Shows suggested actions in the right panel.
- Loads `GET /api/webhooks` to show public labels/descriptions for actions.
- Executes webhooks only after user confirmation and always through the backend.
- Shows execution audit per session from `GET /api/webhook-runs`, without exposing URLs or payload templates.
- Can attach a development Bearer token to test `AUTH_MODE=keycloak`; reads from `localStorage` or `VITE_AUTH_TOKEN`.

## Skills

Skills are persona/behavior prompt fragments loaded from `config/skills.yml` and injected into the LLM's system prompt at chat time. They are NOT capabilities: the model still only recommends actions and the backend still owns execution.

Each skill has: `id`, `name`, `description`, `enabled`, `prompt`. Skills with `enabled: true` are injected into the chat system prompt (after the base identity prompt, before the docs/actions context). Skills with `enabled: false` are kept in the file but inactive. The frontend can list them via `GET /api/skills` (no prompt text is exposed publicly — only id, name, description, enabled).

Edit `config/skills.yml` and restart the API to change the active skill set. The default file ships with `sre-on-call` and `blameless-postmortem` enabled; `security-incident` is shipped disabled as a reference.

The env var `SKILLS_CONFIG_PATH` overrides the default config path (`../../config/skills.yml` relative to `cwd`).

## Per-session context

Every session has an optional `system_prompt` field. When set, it is prepended to every chat turn as a system message (after the base identity prompt and skill prompts, before the docs/actions context). Use it to pin incident id, on-call name, or runbook references that shouldn't drift across the conversation.

- **Frontend**: each session row has a small circle button (`○` empty, `●` set). Click it to open a modal editor with Save and Clear.
- **API**: `PATCH /api/sessions/:id/system-prompt` with `{ "system_prompt": "..." }`. Send `null` or empty string to clear.
- **Limit**: 8000 characters.
- **Persistence**: stored in `chat_sessions.system_prompt`; same `WHERE id = ? AND user_id = ?` ownership rule as every other session operation.

## Observability

Two endpoints surface API metrics:

- `GET /metrics` — Prometheus text exposition (counter / summary), scraper-friendly. Default Prometheus port / scrape target.
- `GET /api/metrics` — same data as JSON for humans and the smoke test. Shape:

  ```json
  {
    "started_at": "2026-06-29T12:00:00.000Z",
    "uptime_seconds": 1234,
    "totals": { "requests": 5678, "errors_5xx": 0 },
    "routes": [
      {
        "route": "/api/chat/stream",
        "method": "POST",
        "count": 42,
        "avg_ms": 1230,
        "p95_ms": 4500,
        "max_ms": 8000,
        "status_buckets": { "200_299": 42 }
      }
    ],
    "recent": [
      {
        "route": "/api/sessions/:id",
        "method": "DELETE",
        "status": 204,
        "durationMs": 4,
        "timestamp": 1782727300000
      }
    ]
  }
  ```

Routes are aggregated by route **template** (e.g. `/api/sessions/:id`), not by raw URL, so `/api/sessions/abc` and `/api/sessions/def` share a bucket. p95 uses a fixed-size streaming reservoir (200 samples) so memory stays bounded under traffic. In-memory only — counters reset on restart; that's the expected behavior for a 5-user MVP.

## Auth

The backend supports two modes:

- `AUTH_MODE=local`: dev mode, uses `local-user` with roles `admin` and `webhook-runner`.
- `AUTH_MODE=keycloak`: validates `Authorization: Bearer <token>` with remote JWKS from `OIDC_ISSUER` and `OIDC_AUDIENCE`.

For manual Keycloak testing, the UI lets you paste a JWT in the "Dev token" box. That token is stored in `localStorage` and sent as `Authorization: Bearer <token>` on API and stream calls. Alternatively, Vite can receive `VITE_AUTH_TOKEN` to preconfigure it for the local environment.

Claims used from Keycloak:

- `sub` as `user.id`.
- `preferred_username` and `email` for display.
- Roles from `realm_access.roles` and `resource_access[OIDC_AUDIENCE].roles`.

## Basic hardening

- `API_BODY_LIMIT_BYTES`: global Fastify body limit. Default: `1048576`.
- `CHAT_MESSAGE_MAX_CHARS`: chat message and `lastUserMessage` limit on webhooks. Default: `8000`.
- `CORS_ALLOWED_ORIGINS`: comma-separated list. If unset, open for dev.
- `LLM_TIMEOUT_MS`: OpenAI-compatible call timeout. Default: `30000`.
- `WEBHOOK_TIMEOUT_MS`: backend-only webhook execution timeout. Default: `15000`.
- `WEBHOOK_RETRY_MAX_ATTEMPTS`: retries per webhook on transient errors (5xx, 429, timeout, network). Default: `3`.
- `WEBHOOK_RETRY_INITIAL_BACKOFF_MS`: initial backoff with exponential growth. Default: `500`.
- `WEBHOOK_RETRY_MAX_BACKOFF_MS`: backoff cap. Default: `5000`.
- `WEBHOOK_RUNS_RETENTION_DAYS`: age cutoff for `webhook_runs` rows. Runs older than this are purged on boot and on a timer. Default: `30`. Set to `0` to disable the age pass.
- `WEBHOOK_RUNS_MAX_PER_USER`: keep at most this many most-recent runs per user. The oldest overflow is purged. Default: `1000`. Set to `0` to disable the cap pass.
- `WEBHOOK_AUDIT_PURGE_INTERVAL_MS`: how often the janitor runs while the API is up. Default: `3600000` (1 hour). Minimum: `60000` (1 minute).
- `CHAT_RATE_LIMIT_PER_MINUTE`: per-user rate limit on `POST /api/chat/stream` (token-bucket refill rate). Default: `20`.
- `CHAT_RATE_LIMIT_BURST`: per-user burst size. Default: `5`. Rejected calls return `429` with `retry-after` in seconds and `x-ratelimit-remaining: 0`.
- The API adds basic defensive headers: `x-content-type-options`, `referrer-policy`, `x-frame-options`.

## End-to-end smoke test

A smoke script exercises the full API (health, auth, models, docs, webhooks, sessions, SSE stream, message persistence and audit).

### With a real LLM (MiniMax)

```bash
# Terminal 1: start the API and the web
export LLM_BASE_URL=https://api.minimax.io/v1
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL=fast
pnpm dev

# Terminal 2: smoke test against http://localhost:3000
pnpm smoke
```

### With the mock LLM (no key)

```bash
# Terminal 1: start the API and the web pointing at the mock
pnpm mock:llm &
export LLM_BASE_URL=http://127.0.0.1:4010/v1
export LLM_API_KEY=dummy
export DEFAULT_MODEL=fast
pnpm dev

# Terminal 2
pnpm smoke

# or in a single step, the script starts the mock internally:
pnpm smoke:mock
```

Steps covered (in order):

1. `/healthz`, `/readyz`
2. `/api/me` (local auth)
3. `/api/models`
4. `/api/docs/search` + `/api/docs/:id`
5. `/api/webhooks`
6. `POST /api/sessions` + `GET /api/sessions`
7. `POST /api/chat/stream` and SSE event parsing (`docs`, `token`, `actions`, `done`)
8. `GET /api/sessions/:id` to confirm the assistant message was persisted
9. `GET /api/webhook-runs?sessionId=...` to confirm audit listing
10. `DELETE /api/sessions/:id` (cleanup)

Optional flags:

- `pnpm smoke --api-base http://localhost:4000` to point at a different API
- `pnpm smoke:mock` (alias of `pnpm smoke --mock-llm`) starts the mock inside the script