Incidents
Detection, ack, snooze, timeline, remediation, and resolution
Runbooks
Full CRUD for remediation plans
Analytics
MTTD, MTTR, trends, top resources, capacity, compliance
SLOs
Targets, error budget, burn rate, and history
Federation
Multi-cluster status, cross-tier correlations
Health
Liveness and readiness probes
Base URL
The default port is 8090 but can be changed via Helm (
--set apiPort=...) or env var CHATCLI_API_PORT. In production, expose behind an Ingress with TLS.Request flow
Authentication
All requests must include theX-API-Key header with a valid key:
Roles
viewer
Read-only. GET on all endpoints. Ideal for dashboards and observability tools.
operator
Daily ops. GET + POST actions (acknowledge, approve, reject). NOC, SRE, and on-call.
admin
Full access. GET, POST, PUT, DELETE. CI/CD, privileged automation, management tooling.
Rate limiting
| Role | Limit | Window |
|---|---|---|
viewer | 100 req | per minute |
operator | 500 req | per minute |
admin | 1000 req | per minute |
Response format
All responses follow a Kubernetes-like pattern:- List
- Single resource
- Error
Error codes
| Code | Meaning | When it happens |
|---|---|---|
400 | Bad Request | Missing or malformed parameters |
401 | Unauthorized | X-API-Key missing or invalid |
403 | Forbidden | Insufficient role for the operation |
404 | Not Found | Resource does not exist |
409 | Conflict | Resource already exists or invalid state for the operation |
429 | Too Many Requests | Rate limit exceeded — see Retry-After |
500 | Internal Server Error | Operator failure — inspect logs |
Pagination
Endpoints that return lists support pagination via query parameters:Page number (starts at 1)
Items per page (maximum: 100)
metadata.totalCount so you can compute the total number of pages.
Versioning
The API uses path-based versioning (/api/v1/). Future versions will be added as /api/v2/ while maintaining backward compatibility with v1.
Breaking changes only happen across major versions. Within a version only compatible additions (new optional fields, new endpoints) are released.
Next steps
AIOps Platform overview
How the platform detects, analyzes, and remediates incidents
Kubernetes Operator
Operator deployment, CRDs, and configuration
Incident lifecycle
Full flow: detection → analysis → remediation → resolution
AIOps in production
Cookbook: full setup with TLS, RBAC, notifications, and SLOs