feat: add web_fetch tool, rename to pi-web

- Rename package to @evan/pi-web (repo rename handled separately).
- Rename existing 'search' tool to 'web_search' for consistency.
- Add 'web_fetch' tool: navigates via the shared headless Firefox,
  extracts via Mozilla Readability, falls back to <body> when no
  article is detected, converts with Turndown. 50KB cap, 15s nav
  timeout. Description steers LLM to curl for raw/non-text content.
- Reuses the shared driver, so search + fetch share one warm browser.
This commit is contained in:
2026-05-25 11:51:46 -04:00
parent ebd7218b95
commit 67ce141b1b
5 changed files with 754 additions and 24 deletions

View File

@@ -1,16 +1,21 @@
# evan/pi-search
# evan/pi-web
Web search extension for [pi coding agent](https://github.com/mariozechner/pi-coding-agent). Registers a single `search` tool. Choose your provider via config.
Web tools for [pi coding agent](https://github.com/mariozechner/pi-coding-agent). Registers two tools backed by a shared headless Firefox session that's kept warm for the lifetime of the pi process.
## Providers
| Tool | Purpose |
| ------------ | ----------------------------------------------------------------------- |
| `web_search` | Search the web via Kagi (session token) or SearXNG (JSON API). |
| `web_fetch` | Fetch a URL and return readable markdown (Readability + Turndown). |
For raw HTTP responses, non-text content, or simple API calls, the LLM is steered toward `bash` + `curl` rather than `web_fetch`.
## Search Providers
| Provider | How it works | Requires |
| --------- | --------------------------------------------------------------------------------------------- | -------------------------------------------- |
| `kagi` | Drives a headless Firefox session against `kagi.com/search?token=…&q=…` and scrapes results. | Kagi session token, `firefox`, `geckodriver` |
| `searxng` | Calls a SearXNG instance's `/search?format=json` endpoint. | A SearXNG base URL with JSON format enabled |
The Kagi driver is shared across calls for the lifetime of the pi process, so you only pay browser startup once per session.
## Config
Drop a JSON file at `~/.pi/pi-search/config.json`:
@@ -29,26 +34,44 @@ Drop a JSON file at `~/.pi/pi-search/config.json`:
### Env Var Overrides
| Variable | Overrides |
| ------------------------- | -------------------- |
| `PI_SEARCH_PROVIDER` | `provider` |
| `KAGI_TOKEN` | `kagi.token` |
| `PI_SEARCH_SEARXNG_URL` | `searxng.baseUrl` |
| Variable | Overrides |
| ----------------------- | ----------------- |
| `PI_SEARCH_PROVIDER` | `provider` |
| `KAGI_TOKEN` | `kagi.token` |
| `PI_SEARCH_SEARXNG_URL` | `searxng.baseUrl` |
### Getting A Kagi Session Token
Open `kagi.com`, sign in, then go to **Settings → Session Link**. Copy the `token=` value from the link. Treat it like a password — it grants full account access.
## Tool
## Tools
| Name | Args | Returns |
| -------- | ------------------- | ------------------------------------------------------ |
| `search` | `query: string` | Markdown list of `## [title](url)\n> description` items |
### `web_search`
| Arg | Type | Description |
| ------- | ------ | ----------------- |
| `query` | string | Search query text |
Returns a markdown list of `## [title](url)\n> description` items.
### `web_fetch`
| Arg | Type | Description |
| ----- | ------ | --------------------- |
| `url` | string | Absolute URL to fetch |
Returns markdown of the page. Pipeline:
1. Navigate the shared Firefox session to the URL (15s timeout).
2. Run [Readability](https://github.com/mozilla/readability) to extract the article subtree.
3. If Readability finds nothing, fall back to the full `<body>`.
4. Convert with [Turndown](https://github.com/mixmark-io/turndown).
5. Truncate at 50KB with a clear marker.
## Install
```bash
cd ~/.pi/agent/extensions/pi-search
cd ~/.pi/agent/extensions/pi-web
npm install
```