feat: add web-glimpse skill for headless browser tasks

2026-04-27 10:45:44 -04:00
parent 5a5aeb592e
commit b85b01bcaa
1 changed files with 155 additions and 0 deletions
--- a/modules/home/programs/terminal/pi/config/skills/web-glimpse/SKILL.md
+++ b/modules/home/programs/terminal/pi/config/skills/web-glimpse/SKILL.md
@@ -0,0 +1,155 @@
+---
+name: web-glimpse
+description: 'Search the web, inspect pages, extract article content, download rendered site content, or capture screenshots using the `glimpse` headless browser tool. Use when the user asks to search the web, look something up online, fetch a browser-rendered page, download/read a website, inspect dynamic content, or see raw browser results. Does not replace curl for simple HTTP/API requests.'
+---
+
+# Web Search And Browser Fetching With Glimpse
+
+## Overview
+
+Use `glimpse` when a task needs web search or browser-rendered website access. It runs Firefox through WebDriver in headless mode by default, so it can observe pages after JavaScript runs and return agent-friendly raw results.
+
+This skill is for:
+
+- Searching the web for current information
+- Opening pages in a real browser and reading rendered content
+- Extracting article-like content as Markdown/text/JSON
+- Inspecting DOM state with custom JavaScript
+- Saving screenshots for visual/debugging work
+- Downloading rendered page content when `curl` would miss JavaScript-generated content
+
+This skill is **not** a blanket replacement for `curl`. Prefer `curl` for simple APIs, static files, health checks, headers, and direct downloads where a browser is unnecessary.
+
+## Tool Summary
+
+```bash
+glimpse <command> <url> [options]
+```
+
+Commands:
+
+| Command | Purpose |
+| ------- | ------- |
+| `search <query>` | Search using the configured provider, usually Kagi, and return JSON results |
+| `snapshot <url>` | Return a JSON snapshot containing title, visible text, headings, links, buttons, inputs, and forms |
+| `reader <url>` | Extract Firefox Reader View content as Markdown, HTML, text, or JSON |
+| `exec <url>` | Execute JavaScript on a page and return the top-level JS result |
+| `screenshot <url>` | Save a PNG screenshot of the rendered page |
+
+Common options:
+
+| Option | Purpose |
+| ------ | ------- |
+| `--timeout=<ms>` | Maximum wait time in milliseconds; increase for slow or JS-heavy pages |
+| `--wait-until=<state>` | Wait for `none`, `interactive`, or `complete` readiness |
+| `--wait-js=<code>` | Poll JavaScript until it returns a truthy value |
+| `--js=<code>` | Execute inline JavaScript before command logic, or return it for `exec` |
+| `--script=<file>` | Execute JavaScript from a file |
+| `--output=<file>` | Write reader output or screenshots to a file |
+| `--no-headless` | Show Firefox instead of running headless when visual debugging is useful |
+
+## Usage Patterns
+
+### Search The Web
+
+Use `search` first when the user asks for current information, docs, release notes, articles, or general web research.
+
+```bash
+glimpse search "query terms" --timeout=15000
+```
+
+Search returns JSON results with `title`, `url`, and `description`. After searching, open the most relevant authoritative result(s) with `snapshot` or `reader` before answering. Do not rely only on search snippets for factual claims unless the user explicitly asks for a quick search summary.
+
+### Read A Page Snapshot
+
+Use `snapshot` for a broad, raw view of the rendered page.
+
+```bash
+glimpse snapshot https://example.com --wait-until=complete --timeout=15000
+```
+
+The JSON result includes visible text plus structured headings, links, forms, buttons, and inputs. This is usually the best first command for documentation pages, landing pages, dashboards, and pages that are not article-like.
+
+### Extract Article Content
+
+Use `reader` for blog posts, news articles, documentation pages, and other long-form content.
+
+```bash
+glimpse reader https://example.com/article --format=markdown --timeout=20000
+```
+
+If Reader View fails with `No readable article content found`, fall back to `snapshot` or `exec`.
+
+Useful formats:
+
+```bash
+glimpse reader https://example.com/article --format=markdown
+glimpse reader https://example.com/article --format=text
+glimpse reader https://example.com/article --format=json
+```
+
+To save extracted content:
+
+```bash
+glimpse reader https://example.com/article --format=markdown --output=article.md
+```
+
+### Inspect Dynamic Content With JavaScript
+
+Use `exec` when you need a targeted extraction or need to inspect content generated by JavaScript.
+
+```bash
+glimpse exec https://example.com --wait-until=complete --js='return {
+  title: document.title,
+  text: document.body.innerText.slice(0, 4000),
+  links: Array.from(document.links).map(a => ({ text: a.innerText, href: a.href }))
+}'
+```
+
+Use `--wait-js` when a SPA needs time for a selector or app state to appear:
+
+```bash
+glimpse snapshot https://example.com/app \
+  --wait-js='return document.querySelector("main") && document.body.innerText.length > 500' \
+  --timeout=30000
+```
+
+### Capture Screenshots
+
+Use screenshots when layout, visual state, charts, or rendering bugs matter.
+
+```bash
+glimpse screenshot https://example.com --wait-until=complete --output=example.png
+```
+
+You can adjust the page before capture:
+
+```bash
+glimpse screenshot https://example.com \
+  --js='document.body.style.zoom = "80%"' \
+  --output=example.png
+```
+
+## Recommended Workflow
+
+1. **Choose The Lightest Useful Tool** — Use `curl` for simple static/API requests. Use `glimpse` when search, JavaScript, rendered content, Reader View, forms, or screenshots are useful.
+2. **Search Before Browsing** — For open-ended questions, run `glimpse search` and select authoritative sources.
+3. **Verify With Page Content** — Open candidate sources with `snapshot`, `reader`, or targeted `exec` before drawing conclusions.
+4. **Use Timeouts Deliberately** — Start around `15000` ms. Increase to `30000` ms for slow, JavaScript-heavy, or anti-bot-delayed pages.
+5. **Prefer Structured Results** — Use `snapshot` or `exec` to extract links/headings/text as JSON rather than scraping terminal output by hand.
+6. **Save Artifacts When Asked** — Use `--output` for reader exports or screenshots. If creating local files, report the saved path.
+
+## Error Handling
+
+- `USAGE_ERROR`: Check command ordering. Most commands are `glimpse <command> <url>`, while search is `glimpse search "query"`.
+- `TIMEOUT`: Increase `--timeout`, add `--wait-until=complete`, or use a more specific `--wait-js`.
+- `No readable article content found`: Reader View could not detect article content; use `snapshot` or `exec`.
+- Empty or thin content: Try `--wait-until=complete`, a page-specific `--wait-js`, or inspect with `exec`.
+- Search authentication/provider issues: Glimpse is normally configured with Kagi via `~/.config/glimpse/config.json`; report auth/provider errors if they occur.
+
+## Answering Guidelines
+
+- Cite or mention the URLs you inspected when summarizing web research.
+- Distinguish search snippets from content verified by opening a page.
+- If sites disagree, say so and prefer official documentation, release notes, primary sources, or reputable references.
+- Do not claim a page was downloaded or rendered unless you actually ran `glimpse` and saw successful output.