- pnpm monorepo: apps/api (Fastify + SQLite + SSE), apps/web (React+Vite), packages/shared, packages/pi-adapter - Local auth (admin/webhook-runner roles) + Keycloak JWT ready - Multi-session chat with reliable history (user persisted before LLM, assistant persisted after stream) - Markdown knowledge base with /api/docs/search + /api/docs/:id - YAML webhook catalog with backend-only execution, retry/backoff, audit (webhook_runs), and per-user rate limit - Skills config (sre-on-call, blameless-postmortem, security-incident) injected into LLM system prompt - LLM provider failover chain (config/models.yml fallback + LLM_FALLBACK_CHAIN override) - Context-aware webhooks panel + backend id-mention safety net - Per-message stats (time/duration/tokens/model), Markdown+GFM render, code & table copy/download buttons - Vitest suite, end-to-end smoke test (scripts/smoke.mjs), per-session system prompt override - /metrics Prometheus endpoint + /api/metrics JSON, request-id correlation - dotenv with explicit repo-root path; envString/envNumber helpers (handles empty-string env) - Runbooks + SOPs under knowledge/ in English; README, docs, and INDEX.md in English
SIC — Super Incident Commander
Lightweight web harness to use a centralized pi.dev engine from the browser, with independent sessions, reliable history in SQLite, internal Markdown documentation, and webhooks executed only from the backend after explicit user confirmation.
MVP scope
- Expected ceiling: 5 concurrent users.
- Frontend: React + Vite.
- Backend: Node.js + Fastify.
- Initial persistence: SQLite.
- LLM: OpenAI-compatible endpoint via
pi-adapter. - Default LLM provider: MiniMax OpenAI-compatible.
- Configuration: YAML + environment variables.
- Initial deploy: Docker Compose.
Reliability principle
Nothing critical lives only in memory. Sessions, messages, and webhook audit are rebuilt from SQLite.
Every conversation read/write must respect:
WHERE session_id = ?
AND user_id = ?
Structure
apps/
api/ # Fastify API, SSE, sessions, webhooks, docs
web/ # React + Vite UI
packages/
shared/ # Shared types
pi-adapter/ # pi.dev / OpenAI-compatible adapter
config/ # YAML for models, webhooks and docs
knowledge/ # Internal Markdown documentation
deploy/ # Docker Compose and future manifests
docs/ # Definitions, reliable history and agents
scripts/ # End-to-end smoke test + mock LLM
API surface
GET /healthzGET /readyzGET /api/versionGET /api/meGET /api/sessionsPOST /api/sessionsGET /api/sessions/:idPATCH /api/sessions/:idDELETE /api/sessions/:idGET /api/docs/search?q=vpnGET /api/docs/:idGET /api/modelsGET /api/webhooksGET /api/webhook-runs?sessionId=...POST /api/webhooks/:id/runGET /api/skillsPATCH /api/sessions/:id/system-prompt— set per-session contextGET /metrics— Prometheus textGET /api/metrics— same as JSONPOST /api/chat/stream
Chat stream contract
POST /api/chat/stream takes sessionId, message and optionally model.
Reliability rules:
- Validate that the session belongs to the current user.
- Persist the
usermessage before calling the LLM. - If the session has no title yet, derive a short one from the first message.
- Validate the requested model against
config/models.yml. - Search relevant Markdown docs and role-allowed webhooks.
- Call the OpenAI-compatible endpoint via
pi-adapter. If the model has a fallback chain, the chat route walks it on structured or transport errors; the firstok=trueresponse wins. - Emit SSE events:
docs,token,actions,done. - Persist the
assistantresponse; if every model in the chain fails, persist a controlled message with error metadata and the full failure trail.
Provider fallback
Each model in config/models.yml can declare fallback: [other-id, ...]. The chat route walks the chain when a model returns ok=false (no_content / json_parse / schema) or throws (5xx / 429 / network / timeout). When the assistant metadata is persisted, it includes requested_model, fallback_attempts, fallback_chain, and fallback_failures whenever the chain was actually used, so you can see what happened in the chat history.
Override the chain globally with LLM_FALLBACK_CHAIN (comma-separated ids, first entry after the requested model). Leave empty to use each model's YAML chain.
Default chain today (from config/models.yml):
fast→ no fallback (it IS the cheap path)balanced→mr-autoreasoning→ no fallbackmr-auto→ no fallback
MiniMax
The project is wired to MiniMax via the official OpenAI-compatible endpoint:
- Base URL:
https://api.minimax.io/v1 - Chat path used by the adapter:
/chat/completions - Auth:
Authorization: Bearer <key>
Models configured in config/models.yml:
fast→MiniMax-M2.7-highspeedbalanced→MiniMax-M2.7reasoning→MiniMax-M3
To run locally, set the key:
export MINIMAX_API_KEY="your-key"
export LLM_BASE_URL="https://api.minimax.io/v1"
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL="fast"
In Docker Compose you only need to export MINIMAX_API_KEY; the compose maps it to LLM_API_KEY.
UI MVP
The React app already consumes the API through the Vite proxy:
- Loads or creates a local session.
- Loads
GET /api/modelsand lets the user pick the model per message. - Lists persisted sessions and lets the user switch between them.
- Lets the user rename and delete sessions, always through the API with per-user isolation.
- Sends messages to
POST /api/chat/streamand consumes SSE events. - Shows recommended documentation and lets the user open the full document via
GET /api/docs/:id. - Shows suggested actions in the right panel.
- Loads
GET /api/webhooksto show public labels/descriptions for actions. - Executes webhooks only after user confirmation and always through the backend.
- Shows execution audit per session from
GET /api/webhook-runs, without exposing URLs or payload templates. - Can attach a development Bearer token to test
AUTH_MODE=keycloak; reads fromlocalStorageorVITE_AUTH_TOKEN.
Skills
Skills are persona/behavior prompt fragments loaded from config/skills.yml and injected into the LLM's system prompt at chat time. They are NOT capabilities: the model still only recommends actions and the backend still owns execution.
Each skill has: id, name, description, enabled, prompt. Skills with enabled: true are injected into the chat system prompt (after the base identity prompt, before the docs/actions context). Skills with enabled: false are kept in the file but inactive. The frontend can list them via GET /api/skills (no prompt text is exposed publicly — only id, name, description, enabled).
Edit config/skills.yml and restart the API to change the active skill set. The default file ships with sre-on-call and blameless-postmortem enabled; security-incident is shipped disabled as a reference.
The env var SKILLS_CONFIG_PATH overrides the default config path (../../config/skills.yml relative to cwd).
Per-session context
Every session has an optional system_prompt field. When set, it is prepended to every chat turn as a system message (after the base identity prompt and skill prompts, before the docs/actions context). Use it to pin incident id, on-call name, or runbook references that shouldn't drift across the conversation.
- Frontend: each session row has a small circle button (
○empty,●set). Click it to open a modal editor with Save and Clear. - API:
PATCH /api/sessions/:id/system-promptwith{ "system_prompt": "..." }. Sendnullor empty string to clear. - Limit: 8000 characters.
- Persistence: stored in
chat_sessions.system_prompt; sameWHERE id = ? AND user_id = ?ownership rule as every other session operation.
Observability
Two endpoints surface API metrics:
-
GET /metrics— Prometheus text exposition (counter / summary), scraper-friendly. Default Prometheus port / scrape target. -
GET /api/metrics— same data as JSON for humans and the smoke test. Shape:{ "started_at": "2026-06-29T12:00:00.000Z", "uptime_seconds": 1234, "totals": { "requests": 5678, "errors_5xx": 0 }, "routes": [ { "route": "/api/chat/stream", "method": "POST", "count": 42, "avg_ms": 1230, "p95_ms": 4500, "max_ms": 8000, "status_buckets": { "200_299": 42 } } ], "recent": [ { "route": "/api/sessions/:id", "method": "DELETE", "status": 204, "durationMs": 4, "timestamp": 1782727300000 } ] }
Routes are aggregated by route template (e.g. /api/sessions/:id), not by raw URL, so /api/sessions/abc and /api/sessions/def share a bucket. p95 uses a fixed-size streaming reservoir (200 samples) so memory stays bounded under traffic. In-memory only — counters reset on restart; that's the expected behavior for a 5-user MVP.
Auth
The backend supports two modes:
AUTH_MODE=local: dev mode, useslocal-userwith rolesadminandwebhook-runner.AUTH_MODE=keycloak: validatesAuthorization: Bearer <token>with remote JWKS fromOIDC_ISSUERandOIDC_AUDIENCE.
For manual Keycloak testing, the UI lets you paste a JWT in the "Dev token" box. That token is stored in localStorage and sent as Authorization: Bearer <token> on API and stream calls. Alternatively, Vite can receive VITE_AUTH_TOKEN to preconfigure it for the local environment.
Claims used from Keycloak:
subasuser.id.preferred_usernameandemailfor display.- Roles from
realm_access.rolesandresource_access[OIDC_AUDIENCE].roles.
Basic hardening
API_BODY_LIMIT_BYTES: global Fastify body limit. Default:1048576.CHAT_MESSAGE_MAX_CHARS: chat message andlastUserMessagelimit on webhooks. Default:8000.CORS_ALLOWED_ORIGINS: comma-separated list. If unset, open for dev.LLM_TIMEOUT_MS: OpenAI-compatible call timeout. Default:30000.WEBHOOK_TIMEOUT_MS: backend-only webhook execution timeout. Default:15000.WEBHOOK_RETRY_MAX_ATTEMPTS: retries per webhook on transient errors (5xx, 429, timeout, network). Default:3.WEBHOOK_RETRY_INITIAL_BACKOFF_MS: initial backoff with exponential growth. Default:500.WEBHOOK_RETRY_MAX_BACKOFF_MS: backoff cap. Default:5000.WEBHOOK_RUNS_RETENTION_DAYS: age cutoff forwebhook_runsrows. Runs older than this are purged on boot and on a timer. Default:30. Set to0to disable the age pass.WEBHOOK_RUNS_MAX_PER_USER: keep at most this many most-recent runs per user. The oldest overflow is purged. Default:1000. Set to0to disable the cap pass.WEBHOOK_AUDIT_PURGE_INTERVAL_MS: how often the janitor runs while the API is up. Default:3600000(1 hour). Minimum:60000(1 minute).CHAT_RATE_LIMIT_PER_MINUTE: per-user rate limit onPOST /api/chat/stream(token-bucket refill rate). Default:20.CHAT_RATE_LIMIT_BURST: per-user burst size. Default:5. Rejected calls return429withretry-afterin seconds andx-ratelimit-remaining: 0.- The API adds basic defensive headers:
x-content-type-options,referrer-policy,x-frame-options.
End-to-end smoke test
A smoke script exercises the full API (health, auth, models, docs, webhooks, sessions, SSE stream, message persistence and audit).
With a real LLM (MiniMax)
# Terminal 1: start the API and the web
export LLM_BASE_URL=https://api.minimax.io/v1
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL=fast
pnpm dev
# Terminal 2: smoke test against http://localhost:3000
pnpm smoke
With the mock LLM (no key)
# Terminal 1: start the API and the web pointing at the mock
pnpm mock:llm &
export LLM_BASE_URL=http://127.0.0.1:4010/v1
export LLM_API_KEY=dummy
export DEFAULT_MODEL=fast
pnpm dev
# Terminal 2
pnpm smoke
# or in a single step, the script starts the mock internally:
pnpm smoke:mock
Steps covered (in order):
/healthz,/readyz/api/me(local auth)/api/models/api/docs/search+/api/docs/:id/api/webhooksPOST /api/sessions+GET /api/sessionsPOST /api/chat/streamand SSE event parsing (docs,token,actions,done)GET /api/sessions/:idto confirm the assistant message was persistedGET /api/webhook-runs?sessionId=...to confirm audit listingDELETE /api/sessions/:id(cleanup)
Optional flags:
pnpm smoke --api-base http://localhost:4000to point at a different APIpnpm smoke:mock(alias ofpnpm smoke --mock-llm) starts the mock inside the script