Scenario
Production Application
Application “myapp” running in production on Kubernetes
Quick Diagnosis
Team needs to diagnose problems quickly
AI-Powered Analysis
Use AI to analyze logs, events, and metrics
Automatic Context
Automatic K8s context in all queries
Option 1: Local Monitoring
Use this option when you have direct access to the cluster viakubectl.
Option 2: Server with Watcher (Team)
Use this option so the entire team has access to monitoring via a centralized server.Workflow: Production Incident
Fine-Tuning Parameters
Collection Interval
| Scenario | Recommended Interval |
|---|---|
| Stable production | 30s (default) |
| Active investigation | 10s |
| Development | 60s |
| CI/CD monitoring | 15s |
Observation Window
| Scenario | Recommended Window |
|---|---|
| Quick debugging | 30m |
| Normal analysis | 2h (default) |
| Post-mortem | 6h |
| Historical analysis | 24h |
Log Lines
| Scenario | Recommended Lines |
|---|---|
| Verbose apps | 50 |
| Normal | 100 (default) |
| Deep debugging | 500 |
One-Shot for Scripts and Alerts
Integrate ChatCLI with your alerting system:Advanced Tips
Combine with Persistent Contexts
Combine with Persistent Contexts
Save project documentation as context and attach it when using the watcher:
Multiple Deployments
Multiple Deployments
Use multi-target mode to monitor everything in a single instance:The AI receives detailed context from targets with issues and compact summaries from healthy ones, respecting the
maxContextChars budget.Prometheus Metrics
Prometheus Metrics
When
metricsPort is configured, the watcher automatically scrapes the /metrics endpoint of the pods and includes the metrics in the analysis. Use metricsFilter with glob patterns to select only relevant metrics:Option 3: Autonomous AIOps (Operator)
Use this option for automatic problem remediation without human intervention.Autonomous Flow in Action
When a pod starts crashing:Everything happens automatically without human intervention. Auto-generated runbooks are reused for future occurrences of the same type. In agential mode, the AI acts as an autonomous agent with K8s “skills,” and upon resolving the issue, it generates a PostMortem CR with a complete timeline and a reusable Runbook for future occurrences.
(Optional) Add Runbooks
For specific scenarios where you want to control exactly what to do:
Remediation priority: Manual Runbook > Auto-generated Runbook > Agential remediation > Escalation. When there is no manual Runbook, the AI automatically generates a reusable Runbook CR. If neither a Runbook nor AI actions are available, the operator enters agential mode: the AI acts as an autonomous agent in an observe-decide-act loop, and upon resolution, it generates a PostMortem CR and a reusable Runbook.
Deployment Checklist
- Monitoring (Watch + Server)
- Autonomous AIOps (Operator)
- Verify cluster access (
kubectl get pods) - Verify RBAC permissions for pods, logs, events
- Choose mode: local (
chatcli watch) or server (chatcli server) - Define targets: single (
--deployment) or multi (--config targets.yaml) - (Optional) Configure
metricsPortfor Prometheus scraping - Configure appropriate interval and window for the scenario
- Adjust
maxContextCharsif needed (default: 32000) - Test with a simple question: “Is the deployment healthy?”
- (Optional) Integrate with alerts for automatic analysis
- (Optional) Distribute access to the team via token