rikrdo/sic

Files

rikrdo 62728b2200 Initial commit: SIC harness (backend, web, pi-adapter, configs, docs)

- pnpm monorepo: apps/api (Fastify + SQLite + SSE), apps/web (React+Vite), packages/shared, packages/pi-adapter
- Local auth (admin/webhook-runner roles) + Keycloak JWT ready
- Multi-session chat with reliable history (user persisted before LLM, assistant persisted after stream)
- Markdown knowledge base with /api/docs/search + /api/docs/:id
- YAML webhook catalog with backend-only execution, retry/backoff, audit (webhook_runs), and per-user rate limit
- Skills config (sre-on-call, blameless-postmortem, security-incident) injected into LLM system prompt
- LLM provider failover chain (config/models.yml fallback + LLM_FALLBACK_CHAIN override)
- Context-aware webhooks panel + backend id-mention safety net
- Per-message stats (time/duration/tokens/model), Markdown+GFM render, code & table copy/download buttons
- Vitest suite, end-to-end smoke test (scripts/smoke.mjs), per-session system prompt override
- /metrics Prometheus endpoint + /api/metrics JSON, request-id correlation
- dotenv with explicit repo-root path; envString/envNumber helpers (handles empty-string env)
- Runbooks + SOPs under knowledge/ in English; README, docs, and INDEX.md in English

2026-06-29 16:20:53 +02:00

12 KiB

Raw Permalink Blame History

SIC — Super Incident Commander

Lightweight web harness to use a centralized pi.dev engine from the browser, with independent sessions, reliable history in SQLite, internal Markdown documentation, and webhooks executed only from the backend after explicit user confirmation.

MVP scope

Expected ceiling: 5 concurrent users.
Frontend: React + Vite.
Backend: Node.js + Fastify.
Initial persistence: SQLite.
LLM: OpenAI-compatible endpoint via pi-adapter.
Default LLM provider: MiniMax OpenAI-compatible.
Configuration: YAML + environment variables.
Initial deploy: Docker Compose.

Reliability principle

Nothing critical lives only in memory. Sessions, messages, and webhook audit are rebuilt from SQLite.

Every conversation read/write must respect:

WHERE session_id = ?
AND user_id = ?

Structure

apps/
  api/                  # Fastify API, SSE, sessions, webhooks, docs
  web/                  # React + Vite UI
packages/
  shared/               # Shared types
  pi-adapter/           # pi.dev / OpenAI-compatible adapter
config/                 # YAML for models, webhooks and docs
knowledge/              # Internal Markdown documentation
deploy/                 # Docker Compose and future manifests
docs/                   # Definitions, reliable history and agents
scripts/                # End-to-end smoke test + mock LLM

API surface

GET /healthz
GET /readyz
GET /api/version
GET /api/me
GET /api/sessions
POST /api/sessions
GET /api/sessions/:id
PATCH /api/sessions/:id
DELETE /api/sessions/:id
GET /api/docs/search?q=vpn
GET /api/docs/:id
GET /api/models
GET /api/webhooks
GET /api/webhook-runs?sessionId=...
POST /api/webhooks/:id/run
GET /api/skills
PATCH /api/sessions/:id/system-prompt — set per-session context
GET /metrics — Prometheus text
GET /api/metrics — same as JSON
POST /api/chat/stream

Chat stream contract

POST /api/chat/stream takes sessionId, message and optionally model.

Reliability rules:

Validate that the session belongs to the current user.
Persist the user message before calling the LLM.
If the session has no title yet, derive a short one from the first message.
Validate the requested model against config/models.yml.
Search relevant Markdown docs and role-allowed webhooks.
Call the OpenAI-compatible endpoint via pi-adapter. If the model has a fallback chain, the chat route walks it on structured or transport errors; the first ok=true response wins.
Emit SSE events: docs, token, actions, done.
Persist the assistant response; if every model in the chain fails, persist a controlled message with error metadata and the full failure trail.

Provider fallback

Each model in config/models.yml can declare fallback: [other-id, ...]. The chat route walks the chain when a model returns ok=false (no_content / json_parse / schema) or throws (5xx / 429 / network / timeout). When the assistant metadata is persisted, it includes requested_model, fallback_attempts, fallback_chain, and fallback_failures whenever the chain was actually used, so you can see what happened in the chat history.

Override the chain globally with LLM_FALLBACK_CHAIN (comma-separated ids, first entry after the requested model). Leave empty to use each model's YAML chain.

Default chain today (from config/models.yml):

fast → no fallback (it IS the cheap path)
balanced → mr-auto
reasoning → no fallback
mr-auto → no fallback

MiniMax

The project is wired to MiniMax via the official OpenAI-compatible endpoint:

Base URL: https://api.minimax.io/v1
Chat path used by the adapter: /chat/completions
Auth: Authorization: Bearer <key>

Models configured in config/models.yml:

fast → MiniMax-M2.7-highspeed
balanced → MiniMax-M2.7
reasoning → MiniMax-M3

To run locally, set the key:

export MINIMAX_API_KEY="your-key"
export LLM_BASE_URL="https://api.minimax.io/v1"
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL="fast"

In Docker Compose you only need to export MINIMAX_API_KEY; the compose maps it to LLM_API_KEY.

UI MVP

The React app already consumes the API through the Vite proxy:

Loads or creates a local session.
Loads GET /api/models and lets the user pick the model per message.
Lists persisted sessions and lets the user switch between them.
Lets the user rename and delete sessions, always through the API with per-user isolation.
Sends messages to POST /api/chat/stream and consumes SSE events.
Shows recommended documentation and lets the user open the full document via GET /api/docs/:id.
Shows suggested actions in the right panel.
Loads GET /api/webhooks to show public labels/descriptions for actions.
Executes webhooks only after user confirmation and always through the backend.
Shows execution audit per session from GET /api/webhook-runs, without exposing URLs or payload templates.
Can attach a development Bearer token to test AUTH_MODE=keycloak; reads from localStorage or VITE_AUTH_TOKEN.

Skills

Skills are persona/behavior prompt fragments loaded from config/skills.yml and injected into the LLM's system prompt at chat time. They are NOT capabilities: the model still only recommends actions and the backend still owns execution.

Each skill has: id, name, description, enabled, prompt. Skills with enabled: true are injected into the chat system prompt (after the base identity prompt, before the docs/actions context). Skills with enabled: false are kept in the file but inactive. The frontend can list them via GET /api/skills (no prompt text is exposed publicly — only id, name, description, enabled).

Edit config/skills.yml and restart the API to change the active skill set. The default file ships with sre-on-call and blameless-postmortem enabled; security-incident is shipped disabled as a reference.

The env var SKILLS_CONFIG_PATH overrides the default config path (../../config/skills.yml relative to cwd).

Per-session context

Every session has an optional system_prompt field. When set, it is prepended to every chat turn as a system message (after the base identity prompt and skill prompts, before the docs/actions context). Use it to pin incident id, on-call name, or runbook references that shouldn't drift across the conversation.

Frontend: each session row has a small circle button (○ empty, ● set). Click it to open a modal editor with Save and Clear.
API: PATCH /api/sessions/:id/system-prompt with { "system_prompt": "..." }. Send null or empty string to clear.
Limit: 8000 characters.
Persistence: stored in chat_sessions.system_prompt; same WHERE id = ? AND user_id = ? ownership rule as every other session operation.

Observability

Two endpoints surface API metrics:

GET /metrics — Prometheus text exposition (counter / summary), scraper-friendly. Default Prometheus port / scrape target.

GET /api/metrics — same data as JSON for humans and the smoke test. Shape:

{
  "started_at": "2026-06-29T12:00:00.000Z",
  "uptime_seconds": 1234,
  "totals": { "requests": 5678, "errors_5xx": 0 },
  "routes": [
    {
      "route": "/api/chat/stream",
      "method": "POST",
      "count": 42,
      "avg_ms": 1230,
      "p95_ms": 4500,
      "max_ms": 8000,
      "status_buckets": { "200_299": 42 }
    }
  ],
  "recent": [
    {
      "route": "/api/sessions/:id",
      "method": "DELETE",
      "status": 204,
      "durationMs": 4,
      "timestamp": 1782727300000
    }
  ]
}

Routes are aggregated by route template (e.g. /api/sessions/:id), not by raw URL, so /api/sessions/abc and /api/sessions/def share a bucket. p95 uses a fixed-size streaming reservoir (200 samples) so memory stays bounded under traffic. In-memory only — counters reset on restart; that's the expected behavior for a 5-user MVP.

Auth

The backend supports two modes:

AUTH_MODE=local: dev mode, uses local-user with roles admin and webhook-runner.
AUTH_MODE=keycloak: validates Authorization: Bearer <token> with remote JWKS from OIDC_ISSUER and OIDC_AUDIENCE.

For manual Keycloak testing, the UI lets you paste a JWT in the "Dev token" box. That token is stored in localStorage and sent as Authorization: Bearer <token> on API and stream calls. Alternatively, Vite can receive VITE_AUTH_TOKEN to preconfigure it for the local environment.

Claims used from Keycloak:

sub as user.id.
preferred_username and email for display.
Roles from realm_access.roles and resource_access[OIDC_AUDIENCE].roles.

Basic hardening

API_BODY_LIMIT_BYTES: global Fastify body limit. Default: 1048576.
CHAT_MESSAGE_MAX_CHARS: chat message and lastUserMessage limit on webhooks. Default: 8000.
CORS_ALLOWED_ORIGINS: comma-separated list. If unset, open for dev.
LLM_TIMEOUT_MS: OpenAI-compatible call timeout. Default: 30000.
WEBHOOK_TIMEOUT_MS: backend-only webhook execution timeout. Default: 15000.
WEBHOOK_RETRY_MAX_ATTEMPTS: retries per webhook on transient errors (5xx, 429, timeout, network). Default: 3.
WEBHOOK_RETRY_INITIAL_BACKOFF_MS: initial backoff with exponential growth. Default: 500.
WEBHOOK_RETRY_MAX_BACKOFF_MS: backoff cap. Default: 5000.
WEBHOOK_RUNS_RETENTION_DAYS: age cutoff for webhook_runs rows. Runs older than this are purged on boot and on a timer. Default: 30. Set to 0 to disable the age pass.
WEBHOOK_RUNS_MAX_PER_USER: keep at most this many most-recent runs per user. The oldest overflow is purged. Default: 1000. Set to 0 to disable the cap pass.
WEBHOOK_AUDIT_PURGE_INTERVAL_MS: how often the janitor runs while the API is up. Default: 3600000 (1 hour). Minimum: 60000 (1 minute).
CHAT_RATE_LIMIT_PER_MINUTE: per-user rate limit on POST /api/chat/stream (token-bucket refill rate). Default: 20.
CHAT_RATE_LIMIT_BURST: per-user burst size. Default: 5. Rejected calls return 429 with retry-after in seconds and x-ratelimit-remaining: 0.
The API adds basic defensive headers: x-content-type-options, referrer-policy, x-frame-options.

End-to-end smoke test

A smoke script exercises the full API (health, auth, models, docs, webhooks, sessions, SSE stream, message persistence and audit).

With a real LLM (MiniMax)

# Terminal 1: start the API and the web
export LLM_BASE_URL=https://api.minimax.io/v1
export LLM_API_KEY="$MINIMAX_API_KEY"
export DEFAULT_MODEL=fast
pnpm dev

# Terminal 2: smoke test against http://localhost:3000
pnpm smoke

With the mock LLM (no key)

# Terminal 1: start the API and the web pointing at the mock
pnpm mock:llm &
export LLM_BASE_URL=http://127.0.0.1:4010/v1
export LLM_API_KEY=dummy
export DEFAULT_MODEL=fast
pnpm dev

# Terminal 2
pnpm smoke

# or in a single step, the script starts the mock internally:
pnpm smoke:mock

Steps covered (in order):

/healthz, /readyz
/api/me (local auth)
/api/models
/api/docs/search + /api/docs/:id
/api/webhooks
POST /api/sessions + GET /api/sessions
POST /api/chat/stream and SSE event parsing (docs, token, actions, done)
GET /api/sessions/:id to confirm the assistant message was persisted
GET /api/webhook-runs?sessionId=... to confirm audit listing
DELETE /api/sessions/:id (cleanup)

Optional flags:

pnpm smoke --api-base http://localhost:4000 to point at a different API
pnpm smoke:mock (alias of pnpm smoke --mock-llm) starts the mock inside the script

12 KiB Raw Permalink Blame History