Files
sic/README.md
rikrdo 62728b2200 Initial commit: SIC harness (backend, web, pi-adapter, configs, docs)
- pnpm monorepo: apps/api (Fastify + SQLite + SSE), apps/web (React+Vite), packages/shared, packages/pi-adapter
- Local auth (admin/webhook-runner roles) + Keycloak JWT ready
- Multi-session chat with reliable history (user persisted before LLM, assistant persisted after stream)
- Markdown knowledge base with /api/docs/search + /api/docs/:id
- YAML webhook catalog with backend-only execution, retry/backoff, audit (webhook_runs), and per-user rate limit
- Skills config (sre-on-call, blameless-postmortem, security-incident) injected into LLM system prompt
- LLM provider failover chain (config/models.yml fallback + LLM_FALLBACK_CHAIN override)
- Context-aware webhooks panel + backend id-mention safety net
- Per-message stats (time/duration/tokens/model), Markdown+GFM render, code & table copy/download buttons
- Vitest suite, end-to-end smoke test (scripts/smoke.mjs), per-session system prompt override
- /metrics Prometheus endpoint + /api/metrics JSON, request-id correlation
- dotenv with explicit repo-root path; envString/envNumber helpers (handles empty-string env)
- Runbooks + SOPs under knowledge/ in English; README, docs, and INDEX.md in English
2026-06-29 16:20:53 +02:00

12 KiB

SIC — Super Incident Commander

Lightweight web harness to use a centralized pi.dev engine from the browser, with independent sessions, reliable history in SQLite, internal Markdown documentation, and webhooks executed only from the backend after explicit user confirmation.

MVP scope

  • Expected ceiling: 5 concurrent users.
  • Frontend: React + Vite.
  • Backend: Node.js + Fastify.
  • Initial persistence: SQLite.
  • LLM: OpenAI-compatible endpoint via pi-adapter.
  • Default LLM provider: MiniMax OpenAI-compatible.
  • Configuration: YAML + environment variables.
  • Initial deploy: Docker Compose.

Reliability principle

Nothing critical lives only in memory. Sessions, messages, and webhook audit are rebuilt from SQLite.

Every conversation read/write must respect:

WHERE session_id = ?
AND user_id = ?

Structure

apps/
  api/                  # Fastify API, SSE, sessions, webhooks, docs
  web/                  # React + Vite UI
packages/
  shared/               # Shared types
  pi-adapter/           # pi.dev / OpenAI-compatible adapter
config/                 # YAML for models, webhooks and docs
knowledge/              # Internal Markdown documentation
deploy/                 # Docker Compose and future manifests
docs/                   # Definitions, reliable history and agents
scripts/                # End-to-end smoke test + mock LLM

API surface

  • GET /healthz
  • GET /readyz
  • GET /api/version
  • GET /api/me
  • GET /api/sessions
  • POST /api/sessions
  • GET /api/sessions/:id
  • PATCH /api/sessions/:id
  • DELETE /api/sessions/:id
  • GET /api/docs/search?q=vpn
  • GET /api/docs/:id
  • GET /api/models
  • GET /api/webhooks
  • GET /api/webhook-runs?sessionId=...
  • POST /api/webhooks/:id/run
  • GET /api/skills
  • PATCH /api/sessions/:id/system-prompt — set per-session context
  • GET /metrics — Prometheus text
  • GET /api/metrics — same as JSON
  • POST /api/chat/stream

Chat stream contract

POST /api/chat/stream takes sessionId, message and optionally model.

Reliability rules:

  1. Validate that the session belongs to the current user.
  2. Persist the user message before calling the LLM.
  3. If the session has no title yet, derive a short one from the first message.
  4. Validate the requested model against config/models.yml.
  5. Search relevant Markdown docs and role-allowed webhooks.
  6. Call the OpenAI-compatible endpoint via pi-adapter. If the model has a fallback chain, the chat route walks it on structured or transport errors; the first ok=true response wins.
  7. Emit SSE events: docs, token, actions, done.
  8. Persist the assistant response; if every model in the chain fails, persist a controlled message with error metadata and the full failure trail.

Provider fallback

Each model in config/models.yml can declare fallback: [other-id, ...]. The chat route walks the chain when a model returns ok=false (no_content / json_parse / schema) or throws (5xx / 429 / network / timeout). When the assistant metadata is persisted, it includes requested_model, fallback_attempts, fallback_chain, and fallback_failures whenever the chain was actually used, so you can see what happened in the chat history.

Override the chain globally with LLM_FALLBACK_CHAIN (comma-separated ids, first entry after the requested model). Leave empty to use each model's YAML chain.

Default chain today (from config/models.yml):

  • fast → no fallback (it IS the cheap path)
  • balancedmr-auto
  • reasoning → no fallback
  • mr-auto → no fallback

MiniMax

The project is wired to MiniMax via the official OpenAI-compatible endpoint:

  • Base URL: https://api.minimax.io/v1
  • Chat path used by the adapter: /chat/completions
  • Auth: Authorization: Bearer <key>

Models configured in config/models.yml:

  • fastMiniMax-M2.7-highspeed
  • balancedMiniMax-M2.7
  • reasoningMiniMax-M3

To run locally, set the key:

export MINIMAX_API_KEY="your-key"
export LLM_BASE_URL="https://api.minimax.io/v1"
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL="fast"

In Docker Compose you only need to export MINIMAX_API_KEY; the compose maps it to LLM_API_KEY.

UI MVP

The React app already consumes the API through the Vite proxy:

  • Loads or creates a local session.
  • Loads GET /api/models and lets the user pick the model per message.
  • Lists persisted sessions and lets the user switch between them.
  • Lets the user rename and delete sessions, always through the API with per-user isolation.
  • Sends messages to POST /api/chat/stream and consumes SSE events.
  • Shows recommended documentation and lets the user open the full document via GET /api/docs/:id.
  • Shows suggested actions in the right panel.
  • Loads GET /api/webhooks to show public labels/descriptions for actions.
  • Executes webhooks only after user confirmation and always through the backend.
  • Shows execution audit per session from GET /api/webhook-runs, without exposing URLs or payload templates.
  • Can attach a development Bearer token to test AUTH_MODE=keycloak; reads from localStorage or VITE_AUTH_TOKEN.

Skills

Skills are persona/behavior prompt fragments loaded from config/skills.yml and injected into the LLM's system prompt at chat time. They are NOT capabilities: the model still only recommends actions and the backend still owns execution.

Each skill has: id, name, description, enabled, prompt. Skills with enabled: true are injected into the chat system prompt (after the base identity prompt, before the docs/actions context). Skills with enabled: false are kept in the file but inactive. The frontend can list them via GET /api/skills (no prompt text is exposed publicly — only id, name, description, enabled).

Edit config/skills.yml and restart the API to change the active skill set. The default file ships with sre-on-call and blameless-postmortem enabled; security-incident is shipped disabled as a reference.

The env var SKILLS_CONFIG_PATH overrides the default config path (../../config/skills.yml relative to cwd).

Per-session context

Every session has an optional system_prompt field. When set, it is prepended to every chat turn as a system message (after the base identity prompt and skill prompts, before the docs/actions context). Use it to pin incident id, on-call name, or runbook references that shouldn't drift across the conversation.

  • Frontend: each session row has a small circle button ( empty, set). Click it to open a modal editor with Save and Clear.
  • API: PATCH /api/sessions/:id/system-prompt with { "system_prompt": "..." }. Send null or empty string to clear.
  • Limit: 8000 characters.
  • Persistence: stored in chat_sessions.system_prompt; same WHERE id = ? AND user_id = ? ownership rule as every other session operation.

Observability

Two endpoints surface API metrics:

  • GET /metrics — Prometheus text exposition (counter / summary), scraper-friendly. Default Prometheus port / scrape target.

  • GET /api/metrics — same data as JSON for humans and the smoke test. Shape:

    {
      "started_at": "2026-06-29T12:00:00.000Z",
      "uptime_seconds": 1234,
      "totals": { "requests": 5678, "errors_5xx": 0 },
      "routes": [
        {
          "route": "/api/chat/stream",
          "method": "POST",
          "count": 42,
          "avg_ms": 1230,
          "p95_ms": 4500,
          "max_ms": 8000,
          "status_buckets": { "200_299": 42 }
        }
      ],
      "recent": [
        {
          "route": "/api/sessions/:id",
          "method": "DELETE",
          "status": 204,
          "durationMs": 4,
          "timestamp": 1782727300000
        }
      ]
    }
    

Routes are aggregated by route template (e.g. /api/sessions/:id), not by raw URL, so /api/sessions/abc and /api/sessions/def share a bucket. p95 uses a fixed-size streaming reservoir (200 samples) so memory stays bounded under traffic. In-memory only — counters reset on restart; that's the expected behavior for a 5-user MVP.

Auth

The backend supports two modes:

  • AUTH_MODE=local: dev mode, uses local-user with roles admin and webhook-runner.
  • AUTH_MODE=keycloak: validates Authorization: Bearer <token> with remote JWKS from OIDC_ISSUER and OIDC_AUDIENCE.

For manual Keycloak testing, the UI lets you paste a JWT in the "Dev token" box. That token is stored in localStorage and sent as Authorization: Bearer <token> on API and stream calls. Alternatively, Vite can receive VITE_AUTH_TOKEN to preconfigure it for the local environment.

Claims used from Keycloak:

  • sub as user.id.
  • preferred_username and email for display.
  • Roles from realm_access.roles and resource_access[OIDC_AUDIENCE].roles.

Basic hardening

  • API_BODY_LIMIT_BYTES: global Fastify body limit. Default: 1048576.
  • CHAT_MESSAGE_MAX_CHARS: chat message and lastUserMessage limit on webhooks. Default: 8000.
  • CORS_ALLOWED_ORIGINS: comma-separated list. If unset, open for dev.
  • LLM_TIMEOUT_MS: OpenAI-compatible call timeout. Default: 30000.
  • WEBHOOK_TIMEOUT_MS: backend-only webhook execution timeout. Default: 15000.
  • WEBHOOK_RETRY_MAX_ATTEMPTS: retries per webhook on transient errors (5xx, 429, timeout, network). Default: 3.
  • WEBHOOK_RETRY_INITIAL_BACKOFF_MS: initial backoff with exponential growth. Default: 500.
  • WEBHOOK_RETRY_MAX_BACKOFF_MS: backoff cap. Default: 5000.
  • WEBHOOK_RUNS_RETENTION_DAYS: age cutoff for webhook_runs rows. Runs older than this are purged on boot and on a timer. Default: 30. Set to 0 to disable the age pass.
  • WEBHOOK_RUNS_MAX_PER_USER: keep at most this many most-recent runs per user. The oldest overflow is purged. Default: 1000. Set to 0 to disable the cap pass.
  • WEBHOOK_AUDIT_PURGE_INTERVAL_MS: how often the janitor runs while the API is up. Default: 3600000 (1 hour). Minimum: 60000 (1 minute).
  • CHAT_RATE_LIMIT_PER_MINUTE: per-user rate limit on POST /api/chat/stream (token-bucket refill rate). Default: 20.
  • CHAT_RATE_LIMIT_BURST: per-user burst size. Default: 5. Rejected calls return 429 with retry-after in seconds and x-ratelimit-remaining: 0.
  • The API adds basic defensive headers: x-content-type-options, referrer-policy, x-frame-options.

End-to-end smoke test

A smoke script exercises the full API (health, auth, models, docs, webhooks, sessions, SSE stream, message persistence and audit).

With a real LLM (MiniMax)

# Terminal 1: start the API and the web
export LLM_BASE_URL=https://api.minimax.io/v1
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL=fast
pnpm dev

# Terminal 2: smoke test against http://localhost:3000
pnpm smoke

With the mock LLM (no key)

# Terminal 1: start the API and the web pointing at the mock
pnpm mock:llm &
export LLM_BASE_URL=http://127.0.0.1:4010/v1
export LLM_API_KEY=dummy
export DEFAULT_MODEL=fast
pnpm dev

# Terminal 2
pnpm smoke

# or in a single step, the script starts the mock internally:
pnpm smoke:mock

Steps covered (in order):

  1. /healthz, /readyz
  2. /api/me (local auth)
  3. /api/models
  4. /api/docs/search + /api/docs/:id
  5. /api/webhooks
  6. POST /api/sessions + GET /api/sessions
  7. POST /api/chat/stream and SSE event parsing (docs, token, actions, done)
  8. GET /api/sessions/:id to confirm the assistant message was persisted
  9. GET /api/webhook-runs?sessionId=... to confirm audit listing
  10. DELETE /api/sessions/:id (cleanup)

Optional flags:

  • pnpm smoke --api-base http://localhost:4000 to point at a different API
  • pnpm smoke:mock (alias of pnpm smoke --mock-llm) starts the mock inside the script