HTTP and wire scanning
Real credentials don’t always sit on disk. They flow through:
- Live web bundles that ship from production at a public URL.
- HAR files that browsers (Chrome / Firefox / Safari DevTools) produce when you click “Save all as HAR with content.”
- mitmproxy / Burp captures of an authenticated session.
- curl / httpie / Postman exports of one specific request you want to verify.
KeyHog scans every one of these, but the surface is split across a few flags and sources. This page is the map.
TL;DR
| Workflow | Command |
|---|---|
| Scan a public JS bundle | keyhog scan --url https://app.example.com/static/main.js |
| Scan every URL in a list | keyhog scan --url $(cat urls.txt) |
| Scan a source-map exposed by Webpack | keyhog scan --url https://app.example.com/static/main.js.map |
| Scan a HAR export from DevTools | keyhog scan capture.har (see HAR auto-expansion) |
| Scan a single curl response | curl -s https://api/... | keyhog scan --stdin |
| Scan a saved Burp / mitmproxy capture | keyhog scan dump.txt (treats as text - no protocol parsing) |
| Route every fetch through Burp | keyhog scan --url https://... --proxy http://burp:8080 --insecure |
| Scan in an air-gapped network | keyhog scan --url https://... --proxy off |
The --url flag (Web Source)
keyhog scan --url https://app.example.com/static/main.js
keyhog scan --url https://app.example.com/static/main.js \
https://app.example.com/static/runtime.js \
https://app.example.com/static/vendor.js
Each URL is fetched with the shared HTTP client policy (see Proxy and TLS below). The response is routed by extension:
.js→ one chunk per file, scanned as plain text..map→ JSON parsed, eachsourcesContent[i]becomes its own chunk tagged with the original filename. This is how a Webpack build withdevtool: 'source-map'accidentally exposes server- side env vars baked into the bundle at build time..wasm→ linear-memory + import section dumped as strings (best- effort; native WASM symbol extraction lives behind thebinaryfeature).- Everything else → one chunk of text.
Findings are tagged source: "web:js", web:sourcemap,
web:sourcemap:raw, web:wasm, or web:other. The original URL
is the file_path.
SSRF defense
--url refuses to fetch:
- Private RFC1918 ranges (
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16). - Loopback (
127.0.0.0/8,::1). - Link-local (
169.254.0.0/16,fe80::/10). - Cloud metadata endpoints (
169.254.169.254, the GCP / Azure / AWS / DigitalOcean / Hetzner variants).
This isn’t a CLI flag - it’s hardcoded so a user can’t accidentally
turn an --url invocation into a metadata-service IAM exfil.
Proxy and TLS
Everything outbound - --url, --github-org, --s3-bucket,
--verify’s API calls - runs through one HTTP client builder.
Policy:
| Source | Effect |
|---|---|
--proxy http://burp:8080 | Explicit. Wins over everything. |
--proxy off | Disable proxying entirely, ignore env vars. |
KEYHOG_PROXY env var | Same as --proxy. Useful inside CI containers. |
HTTPS_PROXY / HTTP_PROXY | reqwest’s default. Last resort. |
--insecure | Accept any TLS cert (self-signed Burp CA, etc.). |
KEYHOG_INSECURE_TLS=1 | Same as --insecure. |
Order: explicit flag → KEYHOG_PROXY → standard env vars.
User-Agent: keyhog/<version> is always set so you can grep your
proxy logs for keyhog traffic without guessing.
HAR auto-expansion
Any file with a .har extension is recognised by the filesystem
source and expanded into one chunk per request and one chunk per
response. Each chunk carries a source-type that tells you which
side of the exchange it came from:
| Chunk | source_type | What it contains |
|---|---|---|
| Request | wire:har:request | <METHOD> <URL>, every request header, query string, POST body. |
| Response | wire:har:response | <STATUS> <statusText>, every response header, response body. |
Finding file_path becomes <har-path>#<request-url>, so the same
HAR with five different requests produces five distinct paths.
Editors that jump-to-file on path:line URIs land on the HAR but
the URL tail makes the location unambiguous.
keyhog scan capture.har --format json | \
jq '.[] | select(.location.source == "wire:har:request")'
filters down to outbound credentials only - the bug-bounty
“what did I send” view. Swap request for response to see what
the upstream reflected back at you.
A HAR that fails to parse (truncated export from a crashed browser) falls through to plain text scanning so credentials still surface; the file isn’t silently dropped.
Defenses:
- 4×
--max-file-sizebudget on cumulative request+response body bytes. Defeats a malicious HAR that decompresses to gigabytes. - The cheap pre-sniff (
{"log"+"entries"in the first 2 KiB) bails before invoking the JSON parser on a 200 MiB blob that obviously isn’t HAR.
Scanning a single HTTP exchange (stdin)
The most common ad-hoc workflow:
curl -s https://api.example.com/v1/me \
-H "Authorization: Bearer $TOKEN" \
| keyhog scan --stdin
Or just pipe a saved response:
keyhog scan --stdin < response.txt
keyhog scan - (bare dash) is the same as --stdin (grep / wc
convention; added in v0.5.28).
--stdin reads up to ~1 GiB; beyond that, write to a temp file and
scan the path. Findings from stdin carry the stdin source. To get
the richer wire:har:request / wire:har:response provenance tags,
save the exchange as a .har file and scan that instead (see
HAR auto-expansion).
Headers, bodies, URL params - where the secret sits
KeyHog is content-blind: it greps the raw bytes. That means a
Bearer ghp_… in an HTTP header gets the same finding as a
"token": "ghp_…" in a JSON body or a ?token=ghp_… in the URL.
For an HTTP capture this is usually what you want - the location column in the finding gives the byte offset within the capture, and the surrounding context (line ±2) is enough to tell whether it was a header or a body.
What KeyHog does not do today:
- Parse the HTTP wire format and emit
header:Authorizationvsbody:json:$.tokenprovenance fields. - Distinguish a secret in a request from a secret in the response (one is being sent OUT, one is being sent IN - different threat model).
Those land in the roadmap below.
Roadmap
The wire-scanning surface is intentionally narrow today. Items queued for a later release, with their issue links:
-
mitmproxy
.mitmflow-dump support. Same shape as HAR but binary-framed. Use themitmproxy-rscrate to decode. -
Header / body / URL-param provenance. HAR expansion lands one chunk per request and one chunk per response today. The next step is attaching
wire_location: header:<name> | body | queryto each finding so the JSON consumer can filterwire_location == "header:Authorization"for the highest- signal subset (intentional auth tokens vs accidental body leaks vs URL-logged secrets). -
Live proxy mode. Run
keyhog proxy --listen :8080and have it act as an HTTP proxy that scans every flow inline, writing findings to stdout. The use case is recording a browsing session against a target and getting a single report of every credential the site shipped to the client. -
WebSocket frame scanning. HAR files don’t include WebSocket payloads. mitmproxy dumps do. Frame-level scanning would catch tokens passed over upgraded connections (Slack, Discord, collaborative editors).
No promises on timeline - track via github.com/santhsecurity/keyhog/issues.
Why this matters for bug bounties
A modern SPA bundle on a typical SaaS app can ship 200+ npm
dependencies and a sourcemap that exposes every server-side env
var the build process touched. Manual code review of one
main.js.map against the 891-detector corpus is hours; running
keyhog scan --url https://app.target.com/static/main.js.map
takes seconds.
Pair it with --hide-client-safe (see
CLI reference) to filter out keys that the
vendor designed to ship in client bundles (Sentry DSN, Stripe
pk_*, Mapbox pk., PostHog phc_, etc.) and you’re left with
the keys that actually represent an exfiltration boundary.