Initial commit: SIC harness (backend, web, pi-adapter, configs, docs)
- pnpm monorepo: apps/api (Fastify + SQLite + SSE), apps/web (React+Vite), packages/shared, packages/pi-adapter - Local auth (admin/webhook-runner roles) + Keycloak JWT ready - Multi-session chat with reliable history (user persisted before LLM, assistant persisted after stream) - Markdown knowledge base with /api/docs/search + /api/docs/:id - YAML webhook catalog with backend-only execution, retry/backoff, audit (webhook_runs), and per-user rate limit - Skills config (sre-on-call, blameless-postmortem, security-incident) injected into LLM system prompt - LLM provider failover chain (config/models.yml fallback + LLM_FALLBACK_CHAIN override) - Context-aware webhooks panel + backend id-mention safety net - Per-message stats (time/duration/tokens/model), Markdown+GFM render, code & table copy/download buttons - Vitest suite, end-to-end smoke test (scripts/smoke.mjs), per-session system prompt override - /metrics Prometheus endpoint + /api/metrics JSON, request-id correlation - dotenv with explicit repo-root path; envString/envNumber helpers (handles empty-string env) - Runbooks + SOPs under knowledge/ in English; README, docs, and INDEX.md in English
This commit is contained in:
273
README.md
Normal file
273
README.md
Normal file
@@ -0,0 +1,273 @@
|
||||
# SIC — Super Incident Commander
|
||||
|
||||
Lightweight web harness to use a centralized `pi.dev` engine from the browser, with independent sessions, reliable history in SQLite, internal Markdown documentation, and webhooks executed only from the backend after explicit user confirmation.
|
||||
|
||||
## MVP scope
|
||||
|
||||
- Expected ceiling: 5 concurrent users.
|
||||
- Frontend: React + Vite.
|
||||
- Backend: Node.js + Fastify.
|
||||
- Initial persistence: SQLite.
|
||||
- LLM: OpenAI-compatible endpoint via `pi-adapter`.
|
||||
- Default LLM provider: MiniMax OpenAI-compatible.
|
||||
- Configuration: YAML + environment variables.
|
||||
- Initial deploy: Docker Compose.
|
||||
|
||||
## Reliability principle
|
||||
|
||||
Nothing critical lives only in memory. Sessions, messages, and webhook audit are rebuilt from SQLite.
|
||||
|
||||
Every conversation read/write must respect:
|
||||
|
||||
```sql
|
||||
WHERE session_id = ?
|
||||
AND user_id = ?
|
||||
```
|
||||
|
||||
## Structure
|
||||
|
||||
```text
|
||||
apps/
|
||||
api/ # Fastify API, SSE, sessions, webhooks, docs
|
||||
web/ # React + Vite UI
|
||||
packages/
|
||||
shared/ # Shared types
|
||||
pi-adapter/ # pi.dev / OpenAI-compatible adapter
|
||||
config/ # YAML for models, webhooks and docs
|
||||
knowledge/ # Internal Markdown documentation
|
||||
deploy/ # Docker Compose and future manifests
|
||||
docs/ # Definitions, reliable history and agents
|
||||
scripts/ # End-to-end smoke test + mock LLM
|
||||
```
|
||||
|
||||
## API surface
|
||||
|
||||
- `GET /healthz`
|
||||
- `GET /readyz`
|
||||
- `GET /api/version`
|
||||
- `GET /api/me`
|
||||
- `GET /api/sessions`
|
||||
- `POST /api/sessions`
|
||||
- `GET /api/sessions/:id`
|
||||
- `PATCH /api/sessions/:id`
|
||||
- `DELETE /api/sessions/:id`
|
||||
- `GET /api/docs/search?q=vpn`
|
||||
- `GET /api/docs/:id`
|
||||
- `GET /api/models`
|
||||
- `GET /api/webhooks`
|
||||
- `GET /api/webhook-runs?sessionId=...`
|
||||
- `POST /api/webhooks/:id/run`
|
||||
- `GET /api/skills`
|
||||
- `PATCH /api/sessions/:id/system-prompt` — set per-session context
|
||||
- `GET /metrics` — Prometheus text
|
||||
- `GET /api/metrics` — same as JSON
|
||||
- `POST /api/chat/stream`
|
||||
|
||||
## Chat stream contract
|
||||
|
||||
`POST /api/chat/stream` takes `sessionId`, `message` and optionally `model`.
|
||||
|
||||
Reliability rules:
|
||||
|
||||
1. Validate that the session belongs to the current user.
|
||||
2. Persist the `user` message before calling the LLM.
|
||||
3. If the session has no title yet, derive a short one from the first message.
|
||||
4. Validate the requested model against `config/models.yml`.
|
||||
5. Search relevant Markdown docs and role-allowed webhooks.
|
||||
6. Call the OpenAI-compatible endpoint via `pi-adapter`. If the model has a fallback chain, the chat route walks it on structured or transport errors; the first `ok=true` response wins.
|
||||
7. Emit SSE events: `docs`, `token`, `actions`, `done`.
|
||||
8. Persist the `assistant` response; if every model in the chain fails, persist a controlled message with error metadata and the full failure trail.
|
||||
|
||||
### Provider fallback
|
||||
|
||||
Each model in `config/models.yml` can declare `fallback: [other-id, ...]`. The chat route walks the chain when a model returns `ok=false` (no_content / json_parse / schema) or throws (5xx / 429 / network / timeout). When the assistant metadata is persisted, it includes `requested_model`, `fallback_attempts`, `fallback_chain`, and `fallback_failures` whenever the chain was actually used, so you can see what happened in the chat history.
|
||||
|
||||
Override the chain globally with `LLM_FALLBACK_CHAIN` (comma-separated ids, first entry after the requested model). Leave empty to use each model's YAML chain.
|
||||
|
||||
Default chain today (from `config/models.yml`):
|
||||
|
||||
- `fast` → no fallback (it IS the cheap path)
|
||||
- `balanced` → `mr-auto`
|
||||
- `reasoning` → no fallback
|
||||
- `mr-auto` → no fallback
|
||||
|
||||
## MiniMax
|
||||
|
||||
The project is wired to MiniMax via the official OpenAI-compatible endpoint:
|
||||
|
||||
- Base URL: `https://api.minimax.io/v1`
|
||||
- Chat path used by the adapter: `/chat/completions`
|
||||
- Auth: `Authorization: Bearer <key>`
|
||||
|
||||
Models configured in `config/models.yml`:
|
||||
|
||||
- `fast` → `MiniMax-M2.7-highspeed`
|
||||
- `balanced` → `MiniMax-M2.7`
|
||||
- `reasoning` → `MiniMax-M3`
|
||||
|
||||
To run locally, set the key:
|
||||
|
||||
```bash
|
||||
export MINIMAX_API_KEY="your-key"
|
||||
export LLM_BASE_URL="https://api.minimax.io/v1"
|
||||
export LLM_API_KEY="$MINIMAX_API_KEY"
|
||||
export DEFAULT_MODEL="fast"
|
||||
```
|
||||
|
||||
In Docker Compose you only need to export `MINIMAX_API_KEY`; the compose maps it to `LLM_API_KEY`.
|
||||
|
||||
## UI MVP
|
||||
|
||||
The React app already consumes the API through the Vite proxy:
|
||||
|
||||
- Loads or creates a local session.
|
||||
- Loads `GET /api/models` and lets the user pick the model per message.
|
||||
- Lists persisted sessions and lets the user switch between them.
|
||||
- Lets the user rename and delete sessions, always through the API with per-user isolation.
|
||||
- Sends messages to `POST /api/chat/stream` and consumes SSE events.
|
||||
- Shows recommended documentation and lets the user open the full document via `GET /api/docs/:id`.
|
||||
- Shows suggested actions in the right panel.
|
||||
- Loads `GET /api/webhooks` to show public labels/descriptions for actions.
|
||||
- Executes webhooks only after user confirmation and always through the backend.
|
||||
- Shows execution audit per session from `GET /api/webhook-runs`, without exposing URLs or payload templates.
|
||||
- Can attach a development Bearer token to test `AUTH_MODE=keycloak`; reads from `localStorage` or `VITE_AUTH_TOKEN`.
|
||||
|
||||
## Skills
|
||||
|
||||
Skills are persona/behavior prompt fragments loaded from `config/skills.yml` and injected into the LLM's system prompt at chat time. They are NOT capabilities: the model still only recommends actions and the backend still owns execution.
|
||||
|
||||
Each skill has: `id`, `name`, `description`, `enabled`, `prompt`. Skills with `enabled: true` are injected into the chat system prompt (after the base identity prompt, before the docs/actions context). Skills with `enabled: false` are kept in the file but inactive. The frontend can list them via `GET /api/skills` (no prompt text is exposed publicly — only id, name, description, enabled).
|
||||
|
||||
Edit `config/skills.yml` and restart the API to change the active skill set. The default file ships with `sre-on-call` and `blameless-postmortem` enabled; `security-incident` is shipped disabled as a reference.
|
||||
|
||||
The env var `SKILLS_CONFIG_PATH` overrides the default config path (`../../config/skills.yml` relative to `cwd`).
|
||||
|
||||
## Per-session context
|
||||
|
||||
Every session has an optional `system_prompt` field. When set, it is prepended to every chat turn as a system message (after the base identity prompt and skill prompts, before the docs/actions context). Use it to pin incident id, on-call name, or runbook references that shouldn't drift across the conversation.
|
||||
|
||||
- **Frontend**: each session row has a small circle button (`○` empty, `●` set). Click it to open a modal editor with Save and Clear.
|
||||
- **API**: `PATCH /api/sessions/:id/system-prompt` with `{ "system_prompt": "..." }`. Send `null` or empty string to clear.
|
||||
- **Limit**: 8000 characters.
|
||||
- **Persistence**: stored in `chat_sessions.system_prompt`; same `WHERE id = ? AND user_id = ?` ownership rule as every other session operation.
|
||||
|
||||
## Observability
|
||||
|
||||
Two endpoints surface API metrics:
|
||||
|
||||
- `GET /metrics` — Prometheus text exposition (counter / summary), scraper-friendly. Default Prometheus port / scrape target.
|
||||
- `GET /api/metrics` — same data as JSON for humans and the smoke test. Shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"started_at": "2026-06-29T12:00:00.000Z",
|
||||
"uptime_seconds": 1234,
|
||||
"totals": { "requests": 5678, "errors_5xx": 0 },
|
||||
"routes": [
|
||||
{
|
||||
"route": "/api/chat/stream",
|
||||
"method": "POST",
|
||||
"count": 42,
|
||||
"avg_ms": 1230,
|
||||
"p95_ms": 4500,
|
||||
"max_ms": 8000,
|
||||
"status_buckets": { "200_299": 42 }
|
||||
}
|
||||
],
|
||||
"recent": [
|
||||
{
|
||||
"route": "/api/sessions/:id",
|
||||
"method": "DELETE",
|
||||
"status": 204,
|
||||
"durationMs": 4,
|
||||
"timestamp": 1782727300000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Routes are aggregated by route **template** (e.g. `/api/sessions/:id`), not by raw URL, so `/api/sessions/abc` and `/api/sessions/def` share a bucket. p95 uses a fixed-size streaming reservoir (200 samples) so memory stays bounded under traffic. In-memory only — counters reset on restart; that's the expected behavior for a 5-user MVP.
|
||||
|
||||
## Auth
|
||||
|
||||
The backend supports two modes:
|
||||
|
||||
- `AUTH_MODE=local`: dev mode, uses `local-user` with roles `admin` and `webhook-runner`.
|
||||
- `AUTH_MODE=keycloak`: validates `Authorization: Bearer <token>` with remote JWKS from `OIDC_ISSUER` and `OIDC_AUDIENCE`.
|
||||
|
||||
For manual Keycloak testing, the UI lets you paste a JWT in the "Dev token" box. That token is stored in `localStorage` and sent as `Authorization: Bearer <token>` on API and stream calls. Alternatively, Vite can receive `VITE_AUTH_TOKEN` to preconfigure it for the local environment.
|
||||
|
||||
Claims used from Keycloak:
|
||||
|
||||
- `sub` as `user.id`.
|
||||
- `preferred_username` and `email` for display.
|
||||
- Roles from `realm_access.roles` and `resource_access[OIDC_AUDIENCE].roles`.
|
||||
|
||||
## Basic hardening
|
||||
|
||||
- `API_BODY_LIMIT_BYTES`: global Fastify body limit. Default: `1048576`.
|
||||
- `CHAT_MESSAGE_MAX_CHARS`: chat message and `lastUserMessage` limit on webhooks. Default: `8000`.
|
||||
- `CORS_ALLOWED_ORIGINS`: comma-separated list. If unset, open for dev.
|
||||
- `LLM_TIMEOUT_MS`: OpenAI-compatible call timeout. Default: `30000`.
|
||||
- `WEBHOOK_TIMEOUT_MS`: backend-only webhook execution timeout. Default: `15000`.
|
||||
- `WEBHOOK_RETRY_MAX_ATTEMPTS`: retries per webhook on transient errors (5xx, 429, timeout, network). Default: `3`.
|
||||
- `WEBHOOK_RETRY_INITIAL_BACKOFF_MS`: initial backoff with exponential growth. Default: `500`.
|
||||
- `WEBHOOK_RETRY_MAX_BACKOFF_MS`: backoff cap. Default: `5000`.
|
||||
- `WEBHOOK_RUNS_RETENTION_DAYS`: age cutoff for `webhook_runs` rows. Runs older than this are purged on boot and on a timer. Default: `30`. Set to `0` to disable the age pass.
|
||||
- `WEBHOOK_RUNS_MAX_PER_USER`: keep at most this many most-recent runs per user. The oldest overflow is purged. Default: `1000`. Set to `0` to disable the cap pass.
|
||||
- `WEBHOOK_AUDIT_PURGE_INTERVAL_MS`: how often the janitor runs while the API is up. Default: `3600000` (1 hour). Minimum: `60000` (1 minute).
|
||||
- `CHAT_RATE_LIMIT_PER_MINUTE`: per-user rate limit on `POST /api/chat/stream` (token-bucket refill rate). Default: `20`.
|
||||
- `CHAT_RATE_LIMIT_BURST`: per-user burst size. Default: `5`. Rejected calls return `429` with `retry-after` in seconds and `x-ratelimit-remaining: 0`.
|
||||
- The API adds basic defensive headers: `x-content-type-options`, `referrer-policy`, `x-frame-options`.
|
||||
|
||||
## End-to-end smoke test
|
||||
|
||||
A smoke script exercises the full API (health, auth, models, docs, webhooks, sessions, SSE stream, message persistence and audit).
|
||||
|
||||
### With a real LLM (MiniMax)
|
||||
|
||||
```bash
|
||||
# Terminal 1: start the API and the web
|
||||
export LLM_BASE_URL=https://api.minimax.io/v1
|
||||
export LLM_API_KEY="$MINIMAX_API_KEY"
|
||||
export DEFAULT_MODEL=fast
|
||||
pnpm dev
|
||||
|
||||
# Terminal 2: smoke test against http://localhost:3000
|
||||
pnpm smoke
|
||||
```
|
||||
|
||||
### With the mock LLM (no key)
|
||||
|
||||
```bash
|
||||
# Terminal 1: start the API and the web pointing at the mock
|
||||
pnpm mock:llm &
|
||||
export LLM_BASE_URL=http://127.0.0.1:4010/v1
|
||||
export LLM_API_KEY=dummy
|
||||
export DEFAULT_MODEL=fast
|
||||
pnpm dev
|
||||
|
||||
# Terminal 2
|
||||
pnpm smoke
|
||||
|
||||
# or in a single step, the script starts the mock internally:
|
||||
pnpm smoke:mock
|
||||
```
|
||||
|
||||
Steps covered (in order):
|
||||
|
||||
1. `/healthz`, `/readyz`
|
||||
2. `/api/me` (local auth)
|
||||
3. `/api/models`
|
||||
4. `/api/docs/search` + `/api/docs/:id`
|
||||
5. `/api/webhooks`
|
||||
6. `POST /api/sessions` + `GET /api/sessions`
|
||||
7. `POST /api/chat/stream` and SSE event parsing (`docs`, `token`, `actions`, `done`)
|
||||
8. `GET /api/sessions/:id` to confirm the assistant message was persisted
|
||||
9. `GET /api/webhook-runs?sessionId=...` to confirm audit listing
|
||||
10. `DELETE /api/sessions/:id` (cleanup)
|
||||
|
||||
Optional flags:
|
||||
|
||||
- `pnpm smoke --api-base http://localhost:4000` to point at a different API
|
||||
- `pnpm smoke:mock` (alias of `pnpm smoke --mock-llm`) starts the mock inside the script
|
||||
Reference in New Issue
Block a user