- Schedule actions by absolute time, relative delay, cron or interval.
- Wait for conditions (HTTP, K8s, Docker, TCP, file, shell, LLM) and fire an action only when satisfied.
- Chain jobs in a DAG with
DependsOn/Triggers. - Run in daemon mode that survives closing the CLI β ideal for long deploys,
terraform apply, database migrations. - Give agents a
@schedulertool so they plan their own follow-ups (βwait for the deploy and notify meβ).
All three ChatCLI modes (interactive CLI, gRPC server, K8s operator) can use the scheduler. The daemon is optional β in casual use the scheduler runs in-process and the WAL replays jobs the next time you open the CLI.
Flow overview
Each job can fire immediately, wait for a condition, chain other jobs, and propagate lifecycle hooks. The diagram below shows a typical deploy + verify + notify pipeline:Why you need this
Before the scheduler, ChatCLI was always synchronous. You asked, waited, got a response. Now:terraform apply runs, waits for the deployment to become Available, then runs the final check. You come back hours later and ask /jobs history to see what happened.
Two execution modes
- In-process (default)
- Daemon (autonomous)
No setup. Open chatcli as usual and use The prompt status line shows
/schedule / /wait / /jobs. The scheduler runs inside the process.[jobs: 1β³] while jobs are active./schedule β create a job
--when values
The DSL accepts multiple formats:
| Format | Example | Behavior |
|---|---|---|
| Relative | +5m, in 30s, after 2h | Once, after the duration |
| Absolute | at 2026-04-25T14:00, at now | Once, at exact time |
| 5-field cron | cron:0 2 * * *, 0 2 * * * | Recurring, Vixie-cron convention |
| Cron shorthand | @hourly, @daily, @weekly, @monthly, @yearly | Common presets |
| Interval | every 30s, every 5m, every 1h | Recurring with fixed step |
| Condition-gated | when-ready | No time β fires when --wait is satisfied |
| Manual | manual, triggered | Only fires via another jobβs --triggers |
--do values
Seven action types:
| Type | Syntax | Description |
|---|---|---|
| Slash command | /run tests / /coder refactor X | Runs a slash command as if the user typed it |
| Shell | shell: docker compose up -d | Shell command under CoderMode safety |
| Agent task | agent: deploy and verify | Boots a ReAct agent with the task |
| LLM prompt | llm: summarize the weekly report | One-shot headless LLM call |
| Webhook | POST https://hooks.slack.com/... | hello | HTTP request |
| Hook | hook:PostToolUse | Fires a chatcli hook by event name |
| Noop | noop | Useful for triggers-only pipelines |
Full flag set
Examples
/wait β block until condition
Sugar for βwait for X to happen and optionally do Yβ.
Condition DSL
| Syntax | Evaluator | Example |
|---|---|---|
http://host/path==200 | http_status | Wait for HTTP 200 |
http://host~=/ok/ | http_status (regex) | Wait for regex match in body |
tcp://host:port | tcp_reachable | Wait for TCP port open |
k8s:<kind>/<ns>/<name>:<cond> | k8s_resource_ready | k8s:pod/prod/api:Ready |
k8s:<kind>/<name> | k8s_resource_ready | Default namespace, condition Ready |
docker:<name>:running | docker_running | Container running |
docker:<name>:healthy | docker_running | Container healthcheck OK |
file:/path | file_exists | File exists |
file:/path>=100 | file_exists (min_size) | File β₯ 100 bytes |
shell: <cmd> | shell_exit | Shell returns 0 |
<cmd>~=/pattern/ | regex_match | cmd output matches regex |
llm: <question> | llm_check | LLM answers YES |
and(<expr>, <expr>, ...) | all_of | All satisfied |
or(<expr>, <expr>, ...) | any_of | Any satisfied |
not <expr> | negate | Negate child |
Examples
Timeouts
--on-timeout fail(default) β mark astimed_outand stop.--on-timeout fire_anywayβ run the action even without satisfaction.--on-timeout fallbackβ run the alternate action defined inWaitSpec.Fallback(via JSON spec) then fail.
/jobs β manage
- Subcommands (
list,show,cancel, β¦) - Live job IDs for
show/cancel/pause/resume/logs - Values for
--status(pending, running, waiting, β¦) and--owner(me, user, agent, worker, system, hook)
Daemon mode
Lifecycle
--detachre-execs withsetsid(Unix) /CREATE_NEW_PROCESS_GROUP(Windows), freeing the terminal. Log goes to<socket_dir>/daemon.log.- The interactive CLI auto-detects a daemon on the configured socket and becomes a thin client β
/schedule,/wait,/jobsround-trip over IPC. - Stale sockets (dead process) are cleaned automatically before
start.
IPC protocol
UNIX socket with 4-byte length-prefix + JSON payload frames. Kinds:ping,byeβ health/closeenqueue,cancel,pause,resume,query,list,snapshot,statsβ operationssubscribeβ server-sent events for UI
systemd / launchd
chatcli daemon install prints a template ready to paste into /etc/systemd/system/chatcli-scheduler.service or ~/Library/LaunchAgents/.
@scheduler β tool for agents
Inside the ReAct loop, the agent can call @scheduler with 5 subcommands. This lets agents plan their own pauses autonomously.
cmd | Args shape | Returns |
|---|---|---|
schedule | {name, when, do, wait?, timeout?, depends_on?, triggers?, ...} | {job_id, status, summary} |
wait | {until, every?, timeout?, async?, then?} | Sync: {outcome, job} Β· Async: {job_id, status} |
query | {id} | Full job (status, history, transitions) |
list | {filter?: {owner, statuses, tag, name_substr, include_terminal}} | {jobs: [...]} |
cancel | {id, reason?} | {ok, job_id} |
Evaluators and actions β plug-in registry
Built-in evaluators
Each implementsConditionEvaluator in cli/scheduler/condition/:
shell_exit
Runs a command, compares exit code with
expected (default 0).http_status
GET/POST to URL, exact or regex match against body.
file_exists
File presence, min size, stable mtime.
k8s_resource_ready
kubectl get + jsonpath; Pod, Deployment, StatefulSet, Service, etc.docker_running
docker inspect; running + healthcheck.tcp_reachable
TCP dial with timeout.
regex_match
Shell cmd + regex against stdout/stderr/combined.
llm_check
Headless LLM answers YES/NO.
custom
User script β args via env
CHATCLI_SCHEDULER_SPEC.all_of / any_of
Composite with short-circuit and per-child negation.
Built-in actions
Incli/scheduler/action/:
- slash_cmd β invokes
/foo argsvia the command handler. - shell β shell command under CoderMode safety (allowlist/denylist from
/config security). - agent_task β boots ReAct loop with the task.
- worker_dispatch β single-agent worker invocation.
- llm_prompt β headless LLM call, option to append to history.
- webhook β HTTP POST/GET/PUT with JSON body, headers, expected status.
- hook β fires chatcli hook by event.
- noop β useful for triggers-only pipelines.
- agent_resume β resumes an agent parked via
@park. Loads the snapshot, re-enters the ReAct loop with restored history. See Agent Park & Resume. - park_poll β polling driver for
@park for_url/for_cmd. Runs every interval; whensuccess_whenmatches or the deadline elapses, fires anagent_resume. Crash-safe via WAL-replay self-rescheduling.
Durability
WAL (Write-Ahead Log)
- One file per job:
~/.chatcli/scheduler/wal/<jobid>.wal - Framing:
magic[4] | length[4] | crc32[4] | payload | crc32[4]β double CRC detects torn writes. - Atomic write via
tmp+rename+dir fsync. - Corrupt files are renamed to
<jobid>.wal.corruptfor inspection.
Snapshot
- Written every
SNAPSHOT_INTERVAL(default 5m) tosnapshot.json. - Atomic replace via tmp-rename.
- Boot prefers: snapshot β overlay any newer
.wal.
Replay on boot
Running/Waitingjobs at crash time come back asPendingwithAttemptspreserved.- Missed fires honor
MissPolicy:fire_once(default) β coalesce all missed ticks into a single fire.fire_allβ fire per missed tick (opt-in, can saturate).skipβ ignore the missed window, forward to next.
Garbage collection
- Terminal jobs stay for
TTLon disk (default 24h) for/jobs history. - GC loop (
WAL_GC_INTERVAL, default 1h) unlinks expired.wal.
Security
Action allowlist
CHATCLI_SCHEDULER_ACTION_ALLOWLIST controls which action types may be scheduled. Default:
shellβ enqueue preflight + fire-time re-check against CoderMode (see next section).webhookβ http.Client with timeout and max response size.agent_taskβ re-enters the ReAct loop, which keeps its own interactive policy.slash_cmdβ flows through the CLIβs CommandHandler (subject to the normal session rules).
CoderMode preflight for shell
Every shell command embedded in a job (inAction, in Wait.Condition, or in all_of/any_of composite children) is passed to CoderModeβs PolicyManager β the same one /coder and /agent use interactively. Three outcomes:
| Classification | Scheduler behavior |
|---|---|
Allow (allowlist match, or known read-only like kubectl get) | β job admitted |
| Deny (denylist match) | β rejected at /schedule with ErrShellPolicyDeny. Denylist beats --i-know β not an override. |
| Ask (outside allowlist, unknown command) | β οΈ rejected with ErrShellPolicyAsk unless the job has DangerousConfirmed=true (via --i-know) |
RunShell on the bridge reloads the on-disk policy and re-classifies β if the operator added a Deny rule between schedule and execution, the job fails instead of running.
How to edit the CoderMode policy
/config security is now hierarchical. The bare form still dumps the read-only panorama; new subcommands mutate the PolicyManager live and persist to ~/.chatcli/coder_policy.json:
/schedule refuses a command:
deny and forget always prompt [y/N]. allow prompts only when the pattern is βbroadβ (e.g. @coder exec alone, or a very short suffix). Add --yes / -y to skip the prompt in scripts.
Scope of changes: allow / deny / forget update the JSON immediately; the interactive CLI (workerPolicyAdapter) reloads on every Ask prompt, and the scheduler reloads on every RunShell. If you edited the JSON externally, run /config security reload to force every cache to re-read.
The older paths remain valid:
-
Through the
/coderinteractive prompt β choosing βAllow alwaysβ or βDeny foreverβ on a safety prompt also persists the rule viaPolicyManager.AddRule. Same infrastructure as/config security allow/deny. -
Edit
~/.chatcli/coder_policy.jsondirectly β useful for bulk onboarding (ship a ready-made file to the team) or for per-project rules in<root>/coder_policy.json(merged with the global).Patterns use prefix matching on<toolName> <args>as the PolicyManager normalizes. Deny always beats allow.
--i-know and i_know (agents)
When you want to schedule a command outside the allowlist without adding it permanently:
Job.DangerousConfirmed=true and the job passes preflight even with an Ask classification. Denylist still blocks β --i-know does not override an explicit deny.
Agents get the equivalent via tool call:
/agent. To keep agents from using i_know, set CHATCLI_SCHEDULER_ALLOW_AGENTS=false or keep dangerous commands on the denylist (agents can never bypass denylist).
Full bypass (trusted automation)
For internal automation in a trusted environment you can disable the policy check per-job entirely:- Operator enables it:
CHATCLI_SCHEDULER_SHELL_ALLOW_BYPASS=true - Job carries
bypass_safety: truein the action payload:
/coder (choose βAllow alwaysβ) or use --i-know explicitly on /schedule. Bypass is for CI/CD in ephemeral containers where the sandbox is the isolation.
Rate limiting
Global + per-owner token bucket with nanodelay tolerance:Retry-After hint.
Circuit breakers
One breaker per evaluator type and one per action type, classicclosed β open β half_open:
k8s_resource_ready breaker opens and all dependent jobs fail-fast with ErrBreakerOpen instead of saturating the worker pool.
Audit log
Every mutation (create, transition, cancel, fire) writes a JSON line to~/.chatcli/scheduler/audit.log. Rotation via lumberjack (default 10 MiB, 7 backups, 30 days).
Authorization
OwnerUserandOwnerSystemmay cancel any job.OwnerAgentmay only cancel jobs it owns or those of child workers.- Cross-owner cancel returns
ErrNotAuthorizedand fires thePreJobCancelhook for auditing.
Observability
Prometheus metrics
| Metric | Type | Labels |
|---|---|---|
chatcli_scheduler_jobs_created_total | Counter | owner_kind, action_type |
chatcli_scheduler_jobs_fired_total | Counter | outcome, action_type |
chatcli_scheduler_wait_checks_total | Counter | condition_type, satisfied |
chatcli_scheduler_wait_duration_seconds | Histogram | condition_type |
chatcli_scheduler_action_duration_seconds | Histogram | action_type, outcome |
chatcli_scheduler_queue_depth | Gauge | β |
chatcli_scheduler_active_jobs | Gauge | β |
chatcli_scheduler_breaker_state | Gauge | kind, key (0=closed, 1=open, 2=half_open) |
chatcli_scheduler_retries_total | Counter | attempt (bucketed 1/2/3/4+) |
chatcli_scheduler_enqueue_errors_total | Counter | reason (rate_limited, full, invalid, β¦) |
chatcli_scheduler_wal_segments | Gauge | β |
chatcli_scheduler_audit_writes_total | Counter | β |
chatcli_scheduler_daemon_connections | Gauge | β |
Events
The scheduler publishes oncli/bus and fires chatcli hooks:
job.created,job.scheduled,job.firedjob.wait_started,job.wait_tick,job.wait_satisfiedjob.running,job.completed,job.failed,job.timed_out,job.cancelled,job.skippedjob.retry_queued,job.paused,job.resumed,job.dependency_resolvedbreaker.opened,breaker.half_open,breaker.closeddaemon.started,daemon.stopped
Scheduler.<event> as HookEvent.Type β you can wire a Slack webhook to Scheduler.job.failed via ~/.chatcli/hooks.json.
Status line
When jobs are active, the prompt prefix gains[jobs: 2βΆ 1β³ 1β]:
βΆrunningπwaiting (polling)β³pendingβblocked (waiting on deps)βfailed
Full configuration
See Environment Variables β Scheduler for the ~25 env vars./config scheduler
Internal architecture (brief)
- Schedule pump (1 goroutine) drains the priority queue by
NextFireAt. - Worker pool (N goroutines =
WORKER_COUNT) runs handleJob (wait β action β finalize). - Snapshot loop (1 goroutine) periodic freeze.
- GC loop (1 goroutine) reaps expired terminal records.
Next steps
Cookbook: automations
Practical recipes: deploy with wait, cron backup, DAG pipeline.
Reference: Commands
Full table of flags and subcommands.
Reference: Env vars
All 25+ scheduler variables.
Hooks System
Wire Slack/PagerDuty webhooks to scheduler events.