WebFetch & WebSearch

ChatCLI includes two native web tools — @webfetch and @websearch — that allow the agent to search for information on the internet and fetch web pages without depending on external MCP servers.

Web tools are native ChatCLI tools, automatically available in agent and coder modes. No MCP server configuration is required to use them.

@webfetch

Fetches a URL, strips the HTML, and returns the clean text content. Ideal for reading documentation, articles, READMEs, and any web page.

How It Works

HTTP request

ChatCLI makes a GET request to the provided URL with standard browser headers.

HTML parsing

The received HTML is parsed using golang.org/x/net/html, extracting only the text content.

Cleanup

Script, style, navigation tags and non-text elements are removed. The resulting text is cleaned and formatted.

Return

The text content is returned to the agent as the tool call result.

Usage

The LLM invokes @webfetch automatically when it needs to access content from a URL. You can also request it explicitly:

Read the documentation at https://pkg.go.dev/net/http and explain the Client type to me

Argument Formats

JSON
Positional

{
  "tool": "webfetch",
  "args": {
    "url": "https://pkg.go.dev/net/http"
  }
}

webfetch https://pkg.go.dev/net/http

Example

User: Fetch the content from https://go.dev/blog/error-handling-and-go

Agent: I'll fetch the page content.

[tool_call: webfetch {"url": "https://go.dev/blog/error-handling-and-go"}]

Result: The article "Error handling and Go" explains Go's error
handling patterns, including...

@webfetch respects redirects (up to 10), timeouts (30s), and returns clear errors for inaccessible URLs or SSL issues.

Filters for large payloads

Endpoints like Prometheus /metrics, configuration dumps or long listings can easily exceed tens of thousands of characters. @webfetch accepts a set of parameters that perform line-level filtering before truncation, so the useful part is not discarded:

Parameter	Type	Description
`filter`	string (Go regex)	Keep only lines matching the regex. Applied before `exclude` and `from_line`/`to_line`.
`exclude`	string (Go regex)	Drop lines matching the regex. Applied after `filter`.
`from_line`	integer	Start of the window in the filtered view (1-based, inclusive).
`to_line`	integer	End of the window in the filtered view (1-based, inclusive).
`save_to_file`	boolean	Persist the full pre-filter body to the session scratch dir and return preview + absolute path. Triggered automatically when the body exceeds `CHATCLI_WEBFETCH_AUTOSAVE_BYTES` (default `10000`) AND no filter/range is set.
`save_path`	string	Override the generated filename for `save_to_file` (any directory prefix is discarded — writes are always confined to the scratch dir).
`max_length`	integer	Maximum inline content length (default: 20,000). Content above this is truncated inline — or auto-saved to the scratch dir via auto-save.
`render`	boolean	Force (`true`) or suppress (`false`) headless rendering of JS pages — see Rendering JavaScript pages. Without the parameter, `auto` mode decides via heuristics.

Example — filter Prometheus by metric prefix:

{
  "url": "http://payments.prod.svc:9090/metrics",
  "filter": "^chatcli_",
  "exclude": "^# HELP|^# TYPE"
}

Example — page within the filtered payload:

{
  "url": "http://svc/very-long-changelog",
  "filter": "^- ",
  "from_line": 50,
  "to_line": 80
}

Example — save everything to the scratch dir and read slices on demand:

{
  "url": "http://svc/metrics",
  "save_to_file": true
}

Response:

[full response saved to /tmp/chatcli-agent-Xy7K3a/scratch/webfetch_1712...txt — 142,318 bytes.
Use read_file with start/end to examine specific ranges.]

[first ~50K chars of content]

The agent can then issue a read_file against the absolute path returned, choosing the exact line range that matters.

save_to_file always confines writes to CHATCLI_AGENT_TMPDIR. If save_path is an absolute path or contains .., only filepath.Base is used and the write is validated to ensure the result stays within the scratch dir.

Smart auto-save

When the LLM calls @webfetch without filter, exclude, range or an explicit save_to_file, and the returned body exceeds the auto-save threshold, ChatCLI automatically promotes the call to save_to_file=true. This shields the context from giant pages without requiring the model to know the body size in advance. Default: bodies above 10,000 bytes (configurable via CHATCLI_WEBFETCH_AUTOSAVE_BYTES) trigger the auto-save. The inline result is a compact preview (~5,000 chars), and the response opens with an explicit marker:

[auto-saved: response was 142318 bytes — too large to inline.
 Full body is at /tmp/chatcli-agent-.../scratch/webfetch_1712....txt.
 Preview below; use read_file with start/end or rerun with
 filter/from_line/to_line for specific ranges.]

[first ~5000 chars of extracted text]
...(auto-truncated — full body saved to disk)

To disable or loosen auto-save — e.g. offline batches where the agent needs the whole body inline — raise the threshold:

export CHATCLI_WEBFETCH_AUTOSAVE_BYTES=1000000   # 1 MB — effectively disabled for most pages

On a per-call basis, passing any filter (even .*), explicit from_line/to_line, or save_to_file=false disables the automatic promotion.

See Token Efficiency for the full rationale behind this default.

Rendering JavaScript pages (SPA)

Pages that build their content client-side (SPAs, JS-rendered tables) return an empty “shell” on a static fetch — <div id="root"> plus bundles, no actual content. @webfetch solves this with an escalation chain:

Static fetch

Always the first step. Server-rendered pages stop here — zero extra cost.

JS-shell detection

Heuristics: thin extracted text + structural signals (empty #root/#app mount points, <noscript> warnings, framework markers — React, Next, Angular, Vue, Nuxt, Svelte, Gatsby, Remix, Flutter).

Headless render via CDP

A real Chromium renders the page (waits for load + DOM stability) and the settled DOM flows through the same extraction/filter/auto-save pipeline.

Browserless fallbacks

Without a browser, the embedded __NEXT_DATA__ state (Next.js) is recovered from the static HTML; as a last resort an honest note tells the model the fetch may be incomplete.

Browser discovery (in order): Chrome → Chromium → Edge → Brave → explicit path in CHATCLI_WEBFETCH_RENDER_BROWSER → opt-in download of a pinned Chromium (~150 MB, once) with CHATCLI_WEBFETCH_RENDER_AUTOPROVISION=true. No API keys, no external services. Production posture: one shared browser per process (lazy launch, health-checked reuse, shutdown after 2 idle minutes), a circuit breaker (2 launch failures → 5-minute pause), an incognito context per render (cookies never leak between sites) and SSRF enforced inside the browser — every sub-request the page fires is validated through CDP interception, mirroring the regular HTTP path guard. Rendered DOM capped at 10 MB.

Variable	Description	Default
`CHATCLI_WEBFETCH_RENDER`	`auto` (heuristics decide), `always`, `never`	`auto`
`CHATCLI_WEBFETCH_RENDER_TIMEOUT`	Render timeout in seconds	`25`
`CHATCLI_WEBFETCH_RENDER_BROWSER`	Absolute path to a specific Chromium-based binary	(auto-detect)
`CHATCLI_WEBFETCH_RENDER_AUTOPROVISION`	Allows the one-time download of a pinned Chromium when no browser exists	`false`

> @webfetch https://app-spa.example.com/dashboard
Page appears JS-rendered; escalating to headless browser...

@websearch

Performs a web search and returns results with title, URL and snippet. Supports two keyless backends by design — no third-party API key to register: DuckDuckGo (HTML scraping) is the zero-config default, and self-hosted SearxNG is preferred in corporate environments when you point to an internal instance.

Available backends

Backend	Requires	When it shines	Pain points
DuckDuckGo	Nothing	Default, works out of the box, zero config	DDG occasionally serves anti-bot interstitials (CAPTCHA) — may return empty results
Self-hosted SearxNG	`SEARXNG_URL` pointing to your instance	Locked-down corporate networks — you control the backend, no egress to public scraping, aggregates several engines (Bing/Google/Qwant) through the instance	Requires running an internal container + enabling JSON in `settings.yml`
Brave Search	Nothing	Independent index (not a meta-search) — real diversity when DDG blocks	HTML scraping; layout may shift (parser anchored on stable semantic attributes)
Mojeek	Nothing	Independent index, UK-based crawler	Some networks receive a 403 for automated traffic — the chain simply moves on

Fallback chain

For each query, ChatCLI builds an ordered chain of backends. If the first one fails or returns empty, the next one is tried automatically. Default order (CHATCLI_WEBSEARCH_PROVIDER unset or auto):

DuckDuckGo          ← default, always available
SearxNG             ← only added to the chain when SEARXNG_URL is set
Brave Search        ← independent index, zero config
Mojeek              ← independent index, zero config

Explicit override to prefer SearxNG:

export CHATCLI_WEBSEARCH_PROVIDER=searxng
export SEARXNG_URL=https://searx.internal.corp

Result: the chain becomes searxng → duckduckgo → brave → mojeek. The others remain as fallbacks if the SearxNG instance fails. The same applies to any provider: CHATCLI_WEBSEARCH_PROVIDER=brave moves Brave to the front and keeps the rest behind it.

Environment variables

Variable	Description	Default
`CHATCLI_WEBSEARCH_PROVIDER`	Force a specific backend to the top of the chain: `searxng`, `duckduckgo`, `brave`, `mojeek`, or `auto`.	`auto`
`SEARXNG_URL`	Root URL of the SearxNG instance (e.g. `https://searx.internal.corp`). When set, SearxNG joins the chain.	(unset)

`/websearch` command

Interactive manager for the preferred backend. Autocomplete available for subcommands and provider names.

Subcommand	Effect
`/websearch` or `/websearch status`	Show current provider + active chain
`/websearch list`	List known providers and which are configured
`/websearch provider <searxng\|duckduckgo\|brave\|mojeek\|auto>`	Set preferred provider for the session (sets `CHATCLI_WEBSEARCH_PROVIDER` in the process)
`/websearch reset`	Remove the override and return to auto mode

/websearch provider applies only to the current session. To persist, export the env in your shell or add it to .env.

Configuring self-hosted SearxNG

The SearxNG instance must have its JSON API enabled — it isn’t on by default. In the SearxNG settings.yml:

search:
  formats:
    - html
    - json

If you point SEARXNG_URL at an instance without JSON enabled, ChatCLI returns an actionable error instead of a cryptic decode failure:

SearxNG did not return JSON (Content-Type="text/html"). Enable JSON in settings.yml: search.formats: [html, json]

The official searxng/searxng Docker Hub image boots in 30 seconds. In a corporate environment, a single container with internal ingress is enough — and it solves the “DDG blocked by the proxy” problem once and for all.

How it works internally

Chain selection

SelectSearchChain() reads CHATCLI_WEBSEARCH_PROVIDER and SEARXNG_URL, returns an ordered list of backends to try.

Sequential attempt

For each backend in the chain: call the search function. If results come back, stop and format. If it fails or returns zero, log the reason and advance to the next.

Formatting

Results become formatted text with via <provider> in the header, numbered with title + URL + snippet.

Argument Formats

JSON
Positional

{
  "tool": "websearch",
  "args": {
    "query": "golang rate limiting best practices 2026"
  }
}

websearch golang rate limiting best practices 2026

Example

User: Search how to set up OpenTelemetry with Go

Agent: I'll search for up-to-date information on that.

[tool_call: websearch {"query": "opentelemetry go setup tutorial"}]

Search results for: "opentelemetry go setup tutorial" (via DuckDuckGo)

1. Getting Started with OpenTelemetry in Go
   URL: https://opentelemetry.io/docs/languages/go/getting-started/
   Official guide to instrumenting Go applications with OpenTelemetry...

2. OpenTelemetry Go SDK - Complete Guide
   URL: https://example.com/otel-go-guide
   Step-by-step tutorial covering traces, metrics and logs...

The (via DuckDuckGo) or (via SearxNG) header makes it clear which backend responded — useful for diagnosing why a query returned nothing (e.g. DDG served a CAPTCHA → falls back to SearxNG).

Why keyless? ChatCLI is used in corporate environments where managing third-party API keys (Brave Search, Tavily, SerpAPI) creates operational friction — registration, rotation, approvals. Self-hosted SearxNG solves network lockdown without recurring cost; DuckDuckGo covers casual use without any config.

Comparison

Aspect	@webfetch	@websearch
Purpose	Read content from a specific URL	Search the web for a query
Input	URL	Search query
Output	Clean text from the page	List of results (title + URL + snippet)
When to use	You know the exact URL	You need to find information
Engine	HTTP GET + HTML parser	DuckDuckGo (HTML scraping) + SearxNG (JSON API)

Availability

Web tools are available in the following modes:

Mode	@webfetch	@websearch
Chat	No	No
Agent (`/agent`)	Yes	Yes
Coder (`/coder`)	Yes	Yes
One-shot (`-p`)	Yes (with `--agent`)	Yes (with `--agent`)

In interactive chat mode, web tools are not available. You need to be in agent or coder mode for the LLM to invoke them as tool calls.

WebFetch & WebSearch

@webfetch

How It Works

Usage

Argument Formats

Example

Filters for large payloads

Smart auto-save

Rendering JavaScript pages (SPA)

@websearch

Available backends

Fallback chain

Environment variables

`/websearch` command

Configuring self-hosted SearxNG

How it works internally

Argument Formats

Example

Comparison

Availability

Next Steps

MCP Integration

Agentic Plugins

​@webfetch

​How It Works

​Usage

​Argument Formats

​Example

​Filters for large payloads

​Smart auto-save

​Rendering JavaScript pages (SPA)

​@websearch

​Available backends

​Fallback chain

​Environment variables

​/websearch command

​Configuring self-hosted SearxNG

​How it works internally

​Argument Formats

​Example

​Comparison

​Availability

​Next Steps

MCP Integration

Agentic Plugins

@webfetch

How It Works

Usage

Argument Formats

Example

Filters for large payloads

Smart auto-save

Rendering JavaScript pages (SPA)

@websearch

Available backends

Fallback chain

Environment variables

`/websearch` command

Configuring self-hosted SearxNG

How it works internally

Argument Formats

Example

Comparison

Availability

Next Steps