Architecture
- Single-Target (legacy)
- Multi-Target (current)
ResourceWatcher has its own collectors (including an optional PrometheusCollector) and all share a single Kubernetes clientset, minimizing connections.
Usage Modes
- Single Deployment (legacy)
- Multiple Deployments (YAML)
- Server with Watcher
Multi-Target Configuration File
Target Fields
| Field | Description | Required |
|---|---|---|
deployment | Deployment name | Yes |
namespace | Namespace (default: default) | No |
metricsPort | Prometheus endpoint port (0 = disabled) | No |
metricsPath | HTTP path for metrics (default: /metrics) | No |
metricsFilter | Glob filters for metrics (empty = all) | No |
Complete Flags
chatcli watch
| Flag | Description | Default | Env Var |
|---|---|---|---|
--config | Multi-target YAML file | ||
--deployment | Single deployment (legacy) | CHATCLI_WATCH_DEPLOYMENT | |
--namespace | Deployment namespace | default | CHATCLI_WATCH_NAMESPACE |
--interval | Interval between collections | 30s | CHATCLI_WATCH_INTERVAL |
--window | Data time window | 2h | CHATCLI_WATCH_WINDOW |
--max-log-lines | Log lines per pod | 100 | CHATCLI_WATCH_MAX_LOG_LINES |
--kubeconfig | Kubeconfig path | Auto-detected | CHATCLI_KUBECONFIG |
--provider | LLM provider | .env | LLM_PROVIDER |
--model | LLM model | .env | |
-p <prompt> | One-shot: send and exit | ||
--max-tokens | Token limit in response |
chatcli server (watcher flags)
| Flag | Description | Default | Env Var |
|---|---|---|---|
--watch-config | Multi-target YAML file | CHATCLI_WATCH_CONFIG | |
--watch-deployment | Single deployment (legacy) | CHATCLI_WATCH_DEPLOYMENT | |
--watch-namespace | Namespace | default | CHATCLI_WATCH_NAMESPACE |
--watch-interval | Collection interval | 30s | CHATCLI_WATCH_INTERVAL |
--watch-window | Observation window | 2h | CHATCLI_WATCH_WINDOW |
--watch-max-log-lines | Max log lines | 100 | CHATCLI_WATCH_MAX_LOG_LINES |
--watch-kubeconfig | Kubeconfig path | Auto-detected | CHATCLI_KUBECONFIG |
What Is Collected
Collectors per Target
| Collector | Data Collected |
|---|---|
| Deployment | Replicas (ready/available/updated), strategy, conditions |
| Pod Status | Phase, readiness, restarts, termination info, container status |
| Events | K8s events (Warning/Normal), message, reason, timestamp |
| Logs | Last N lines per container per pod |
| Metrics | CPU and memory per pod (via metrics-server) |
| HPA | Min/max replicas, current metrics, desired replicas |
| Prometheus | Application metrics from the pod /metrics endpoint |
Prometheus Collector (New)
ThePrometheusCollector scrapes Prometheus metrics directly from pods:
- Discovers deployment pods and selects 1 Ready pod
- Makes HTTP GET to
http://podIP:port/path(timeout: 5s) - Parses the Prometheus text exposition format (stdlib, no dependencies)
- Filters by configured glob patterns
- Ignores NaN, Inf, and comment lines
Context Budget Management (MultiSummarizer)
With multiple targets, the MultiSummarizer ensures the context does not exceed the LLM window:Algorithm
Scores each target
0 = healthy, 1 = warning, 2 = critical- Critical: CrashLoopBackOff, OOMKilled, critical alerts
- Warning: replicas < desired, error logs, warning alerts
- Healthy: everything ok
Allocates context
- Score >= 1 — full context (~1-3 KB per target)
- Score == 0 — compact one-liner (~80 chars per target)
Example with 20 Targets (2 with issues)
Anomaly Detection
| Anomaly | Condition | Severity |
|---|---|---|
| CrashLoopBackOff | Pod with more than 5 restarts | Critical |
| OOMKilled | Container terminated due to lack of memory | Critical |
| PodNotReady | Pod is not in the Ready state | Warning |
| DeploymentFailing | Deployment with Available=False | Critical |
Observability Store
Collected data is stored in a ring buffer per target with a configurable time window:- Snapshots: Complete periodic state (pods, deployment, HPA, events, metrics, app metrics)
- Logs: Recent logs from each pod with classification (info/warning/error)
- Alerts: Detected anomalies with severity and timestamps
Automatic Rotation
Data older than the time window (--window) is automatically discarded, keeping memory usage constant regardless of the number of targets.
/watch Command
Inside interactive ChatCLI (local or remote), use /watch to see the status:
- Single-Target
- Multi-Target
One-Shot with K8s Context
Example Questions
Requirements
- Kubernetes Cluster: Access via kubeconfig or in-cluster config
- RBAC Permissions: Read access to pods, events, logs, deployments, HPA, ingresses
- metrics-server (optional): For CPU/memory collection
- Prometheus endpoints (optional): Apps that expose
/metricsin Prometheus text format
RBAC
- Single-namespace (Role + RoleBinding)
- Multi-namespace (ClusterRole + ClusterRoleBinding)
AIOps Integration
K8s Watcher alerts automatically feed into the Operator’s AIOps pipeline. When the Operator detects alerts viaGetAlerts RPC, it creates Anomaly CRs that are correlated into Issues, analyzed by AI, and automatically remediated.