Web tools are native ChatCLI tools, automatically available in agent and coder modes. No MCP server configuration is required to use them.
@webfetch
Fetches a URL, strips the HTML, and returns the clean text content. Ideal for reading documentation, articles, READMEs, and any web page.How It Works
HTML parsing
The received HTML is parsed using
golang.org/x/net/html, extracting only the text content.Cleanup
Script, style, navigation tags and non-text elements are removed. The resulting text is cleaned and formatted.
Usage
The LLM invokes@webfetch automatically when it needs to access content from a URL. You can also request it explicitly:
Argument Formats
- JSON
- Positional
Example
Filters for large payloads
Endpoints like Prometheus/metrics, configuration dumps or long listings can easily exceed tens of thousands of characters. @webfetch accepts a set of parameters that perform line-level filtering before truncation, so the useful part is not discarded:
| Parameter | Type | Description |
|---|---|---|
filter | string (Go regex) | Keep only lines matching the regex. Applied before exclude and from_line/to_line. |
exclude | string (Go regex) | Drop lines matching the regex. Applied after filter. |
from_line | integer | Start of the window in the filtered view (1-based, inclusive). |
to_line | integer | End of the window in the filtered view (1-based, inclusive). |
save_to_file | boolean | Persist the full pre-filter body to the session scratch dir and return preview + absolute path. Triggered automatically when the body exceeds CHATCLI_WEBFETCH_AUTOSAVE_BYTES (default 10000) AND no filter/range is set. |
save_path | string | Override the generated filename for save_to_file (any directory prefix is discarded — writes are always confined to the scratch dir). |
max_length | integer | Maximum inline content length (default: 20,000). Content above this is truncated inline — or auto-saved to the scratch dir via auto-save. |
render | boolean | Force (true) or suppress (false) headless rendering of JS pages — see Rendering JavaScript pages. Without the parameter, auto mode decides via heuristics. |
read_file against the absolute path returned, choosing the exact line range that matters.
Smart auto-save
When the LLM calls@webfetch without filter, exclude, range or an explicit save_to_file, and the returned body exceeds the auto-save threshold, ChatCLI automatically promotes the call to save_to_file=true. This shields the context from giant pages without requiring the model to know the body size in advance.
Default: bodies above 10,000 bytes (configurable via CHATCLI_WEBFETCH_AUTOSAVE_BYTES) trigger the auto-save. The inline result is a compact preview (~5,000 chars), and the response opens with an explicit marker:
Rendering JavaScript pages (SPA)
Pages that build their content client-side (SPAs, JS-rendered tables) return an empty “shell” on a static fetch —<div id="root"> plus bundles, no actual content. @webfetch solves this with an escalation chain:
JS-shell detection
Heuristics: thin extracted text + structural signals (empty
#root/#app mount points, <noscript> warnings, framework markers — React, Next, Angular, Vue, Nuxt, Svelte, Gatsby, Remix, Flutter).Headless render via CDP
A real Chromium renders the page (waits for load + DOM stability) and the settled DOM flows through the same extraction/filter/auto-save pipeline.
CHATCLI_WEBFETCH_RENDER_BROWSER → opt-in download of a pinned Chromium (~150 MB, once) with CHATCLI_WEBFETCH_RENDER_AUTOPROVISION=true. No API keys, no external services.
Production posture: one shared browser per process (lazy launch, health-checked reuse, shutdown after 2 idle minutes), a circuit breaker (2 launch failures → 5-minute pause), an incognito context per render (cookies never leak between sites) and SSRF enforced inside the browser — every sub-request the page fires is validated through CDP interception, mirroring the regular HTTP path guard. Rendered DOM capped at 10 MB.
| Variable | Description | Default |
|---|---|---|
CHATCLI_WEBFETCH_RENDER | auto (heuristics decide), always, never | auto |
CHATCLI_WEBFETCH_RENDER_TIMEOUT | Render timeout in seconds | 25 |
CHATCLI_WEBFETCH_RENDER_BROWSER | Absolute path to a specific Chromium-based binary | (auto-detect) |
CHATCLI_WEBFETCH_RENDER_AUTOPROVISION | Allows the one-time download of a pinned Chromium when no browser exists | false |
@websearch
Performs a web search and returns results with title, URL and snippet. Supports two keyless backends by design — no third-party API key to register: DuckDuckGo (HTML scraping) is the zero-config default, and self-hosted SearxNG is preferred in corporate environments when you point to an internal instance.Available backends
| Backend | Requires | When it shines | Pain points |
|---|---|---|---|
| DuckDuckGo | Nothing | Default, works out of the box, zero config | DDG occasionally serves anti-bot interstitials (CAPTCHA) — may return empty results |
| Self-hosted SearxNG | SEARXNG_URL pointing to your instance | Locked-down corporate networks — you control the backend, no egress to public scraping, aggregates several engines (Bing/Google/Qwant) through the instance | Requires running an internal container + enabling JSON in settings.yml |
| Brave Search | Nothing | Independent index (not a meta-search) — real diversity when DDG blocks | HTML scraping; layout may shift (parser anchored on stable semantic attributes) |
| Mojeek | Nothing | Independent index, UK-based crawler | Some networks receive a 403 for automated traffic — the chain simply moves on |
Fallback chain
For each query, ChatCLI builds an ordered chain of backends. If the first one fails or returns empty, the next one is tried automatically. Default order (CHATCLI_WEBSEARCH_PROVIDER unset or auto):
searxng → duckduckgo → brave → mojeek. The others remain as fallbacks if the SearxNG instance fails. The same applies to any provider: CHATCLI_WEBSEARCH_PROVIDER=brave moves Brave to the front and keeps the rest behind it.
Environment variables
| Variable | Description | Default |
|---|---|---|
CHATCLI_WEBSEARCH_PROVIDER | Force a specific backend to the top of the chain: searxng, duckduckgo, brave, mojeek, or auto. | auto |
SEARXNG_URL | Root URL of the SearxNG instance (e.g. https://searx.internal.corp). When set, SearxNG joins the chain. | (unset) |
/websearch command
Interactive manager for the preferred backend. Autocomplete available for subcommands and provider names.
| Subcommand | Effect |
|---|---|
/websearch or /websearch status | Show current provider + active chain |
/websearch list | List known providers and which are configured |
/websearch provider <searxng|duckduckgo|brave|mojeek|auto> | Set preferred provider for the session (sets CHATCLI_WEBSEARCH_PROVIDER in the process) |
/websearch reset | Remove the override and return to auto mode |
/websearch provider applies only to the current session. To persist, export the env in your shell or add it to .env.
Configuring self-hosted SearxNG
The SearxNG instance must have its JSON API enabled — it isn’t on by default. In the SearxNGsettings.yml:
SEARXNG_URL at an instance without JSON enabled, ChatCLI returns an actionable error instead of a cryptic decode failure:
How it works internally
Chain selection
SelectSearchChain() reads CHATCLI_WEBSEARCH_PROVIDER and SEARXNG_URL, returns an ordered list of backends to try.Sequential attempt
For each backend in the chain: call the search function. If results come back, stop and format. If it fails or returns zero, log the reason and advance to the next.
Argument Formats
- JSON
- Positional
Example
(via DuckDuckGo) or (via SearxNG) header makes it clear which backend responded — useful for diagnosing why a query returned nothing (e.g. DDG served a CAPTCHA → falls back to SearxNG).
Why keyless? ChatCLI is used in corporate environments where managing third-party API keys (Brave Search, Tavily, SerpAPI) creates operational friction — registration, rotation, approvals. Self-hosted SearxNG solves network lockdown without recurring cost; DuckDuckGo covers casual use without any config.
Comparison
| Aspect | @webfetch | @websearch |
|---|---|---|
| Purpose | Read content from a specific URL | Search the web for a query |
| Input | URL | Search query |
| Output | Clean text from the page | List of results (title + URL + snippet) |
| When to use | You know the exact URL | You need to find information |
| Engine | HTTP GET + HTML parser | DuckDuckGo (HTML scraping) + SearxNG (JSON API) |
Availability
Web tools are available in the following modes:| Mode | @webfetch | @websearch |
|---|---|---|
| Chat | No | No |
Agent (/agent) | Yes | Yes |
Coder (/coder) | Yes | Yes |
One-shot (-p) | Yes (with --agent) | Yes (with --agent) |
Next Steps
MCP Integration
Integrate additional web tools via MCP servers.
Agentic Plugins
See all tools available to the agent.