What Web Bot Auth Means If You're Already Blocking AI Crawlers: A 2026 Operator's Guide to Cryptographic Crawler Verification

TL;DR: Web Bot Auth is RFC 9421 HTTP Message Signatures applied to crawler traffic. Bots sign their requests with a private key, publish public keys at a .well-known directory, and let you verify the signature instead of trusting the User-Agent header. Google publishes keys at agent.bot.goog for its AI-browsing agent today; Googlebot proper still isn't signed. The 2026 work is to add a signature-verification path without ripping out reverse-DNS, because most Google-claiming traffic is still unsigned and will be for at least another year.

An operator I work with wrote a Cloudflare WAF rule in 2024. It blocks anything whose UA contains GPTBot, ClaudeBot, or PerplexityBot, and allows anything whose UA contains Googlebot. The rule held for two years. Then in spring 2026 a new user agent started showing up in the access logs, Google-Agent, with three unfamiliar headers attached: Signature-Agent, Signature-Input, and Signature. The 2024 rule has no opinion on those headers. It just reads the UA. That gap between rules written against the old verification model and traffic arriving under the new one is what this piece is about. Not "what is robots.txt." Not "should I block AI crawlers." The integration piece for operators who already maintain a bot-policy ruleset and need to know what Web Bot Auth changes for it.

What Web Bot Auth actually is

Web Bot Auth is a bot-flavored profile of RFC 9421 HTTP Message Signatures. The bot signs each outgoing request with a private key. The site fetches the bot's public key from a .well-known/http-message-signatures-directory URL on a domain the bot controls. The site verifies that the signature on the incoming request was produced by the matching private key. If it was, the request's origin claim is cryptographically attested. If it wasn't, the request is forged.

Three new request headers do the work. Signature-Agent points to the bot's key directory. Signature-Input lists what's signed plus metadata: keyid, created and expires timestamps, algorithm, and the literal tag="web-bot-auth" string that marks this as a bot signature (not some other RFC 9421 use case). Signature carries the cryptographic bytes.

"This document describes a mechanism for creating, encoding, and verifying digital signatures or message authentication codes over components of an HTTP message." — A. Backman, J. Richer, M. Sporny, RFC 9421 (HTTP Message Signatures, abstract)

The bot profile on top of the RFC is what makes this Web Bot Auth and not just HTTP message signatures. The IETF draft draft-meunier-web-bot-auth-architecture nails down the bot conventions: the tag="web-bot-auth" string in the input, the well-known directory URL shape, the recommended covered components (at minimum @authority and signature-agent), and the expectation that the directory is cached against its own Cache-Control response header. None of that is in RFC 9421 itself. RFC 9421 is the algebra. Web Bot Auth is the use case.

Why UA + IP verification stopped being enough

The verification stack has three layers, each with a known failure mode. User-Agent is text; anyone can set it. Reverse DNS works for Googlebot but is awkward for newer agent crawlers routed through general-purpose infrastructure. IP allowlists are brittle because cloud egress ranges shift without warning.

Johannes Ullrich at the SANS Internet Storm Center put the UA-spoofing problem bluntly:

"Users have long figured out that setting your user agent to 'Googlebot' may get you past some paywalls." — Johannes Ullrich, SANS Internet Storm Center, September 2025

The IP-allowlist side has a different but related problem. Cloudflare's Thibault Meunier and Mari Galicer, who shepherded the Web Bot Auth proposal at the IETF, framed it this way in their May 2025 post: "connections from the crawling service might be shared by multiple users, such as in the case of privacy proxies and VPNs, and these ranges, often maintained by cloud providers, change over time." An allowlist that was correct on Monday can be wrong by Friday.

The agent-traffic shift makes the old stack worse. When a crawler is acting on behalf of an individual user from inside a chat session, the source profile fragments. Cloudflare flagged the framing change directly: "Bots are no longer directed only by the bot owners, but also by individual end users to act on their behalf."

Comparison table of four crawler verification methods (User-Agent, reverse DNS, IP allowlist, Web Bot Auth) across trust level, operator cost, failure mode, and latency — Four ways to verify a crawler claim. Web Bot Auth is the only rail that survives a UA spoofer behind a proxy with a new egress IP.

Method	Trust	Operator cost	Fails on	Latency
User-Agent string	Lowest	Free	Anyone can spoof; SANS notes the Googlebot UA has long bypassed paywalls	0 ms
Reverse DNS + forward confirm	Medium	~1 ms per request	Only works for crawlers with stable PTR records (Googlebot proper, Bingbot)	~1-5 ms
IP allowlist (CIDR ranges)	Medium	List maintenance	Cloud egress ranges shift; shared with privacy proxies and VPNs	0 ms
Web Bot Auth (RFC 9421)	High	Middleware + key cache	Only the bot operators that have published a key directory	~0.1 ms (cached key)

Cryptographic verification is the one rail that survives all three legacy failure modes. It doesn't care about the source IP, doesn't trust the UA, and doesn't need a reverse-DNS lookup. It cares about the key.

What a signed Googlebot request actually looks like

A signed request from Google's agent follows the shape documented in Cloudflare's reference docs and Google's developer guide. Approximate form, with the keyid abbreviated:

GET /article/example HTTP/1.1
Host: yoursite.com
User-Agent: Mozilla/5.0 (compatible; Google-Agent/1.0; ...)
Signature-Agent: g="https://agent.bot.goog"
Signature-Input: sig=("@authority" "signature-agent")
 ;created=1735689600;keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U"
 ;alg="ed25519";expires=1735693200;tag="web-bot-auth"
Signature: sig=:MEQCIBmw...truncated...:

Read it left to right. Signature-Agent tells you where to fetch the public key. The literal g="https://agent.bot.goog" resolves to a directory at https://agent.bot.goog/.well-known/http-message-signatures-directory. Signature-Input describes what's being attested: in this case the @authority derived component (the host name) and the signature-agent header itself, signed at created seconds Unix-epoch and valid until expires. The keyid is a JWK thumbprint that picks one specific key out of the directory. Signature carries the ed25519 signature bytes.

Annotated anatomy of a signed Web Bot Auth request showing Signature-Agent, Signature-Input, and Signature headers with the keyid, expires, tag, and alg parameters called out — The three headers in a signed bot request. Signature-Agent locates the public key, Signature-Input describes what's signed, Signature carries the bytes.

The semantics of each parameter, in table form, since this is the bit operators get wrong on a first read:

Header / parameter	What it does	What you check
`Signature-Agent`	Points to the bot's public-key directory	Is the URL one you trust? (For Google: `https://agent.bot.goog`)
`Signature-Input` covered components	Lists which parts of the request are signed	At minimum `@authority` and `signature-agent` should be present
`keyid`	Picks one key out of the directory (JWK thumbprint)	Does the directory have a key with this thumbprint?
`created` / `expires`	Validity window in seconds since Unix epoch	Is the request within the window? `expires` is a hard fail
`alg`	Signature algorithm	Usually `ed25519` in Web Bot Auth; your verifier needs that algorithm
`tag`	Profile marker	Must be the literal string `web-bot-auth`
`Signature`	The signature bytes	Verify with the public key matching `keyid`

One caveat Google states directly: "Not all Google user agents are using Web Bot Auth." In May 2026 the user agent that consistently signs is Google-Agent, the AI-browsing agent behind Google's AI Mode features. Googlebot proper, the indexing crawler that drives most of your organic traffic, is not signed yet. Plan your rules accordingly.

The verification flow, end to end

The verification path is four steps. None of them is expensive. The moving pieces are the directory cache and the algorithm library, not the math.

Step one. The request arrives. Look for a Signature-Agent header. If it's missing, the request is unsigned and you fall through to the legacy verification path (reverse DNS, UA, IP). Most requests are still in this bucket in 2026.

Step two. Parse Signature-Input. Pull out keyid, created, expires, alg, and tag. Reject anything where tag isn't the literal web-bot-auth string. Reject anything past expires. Both rejections happen before you touch the public key.

Step three. Fetch the public-key directory at the URL given by Signature-Agent. Honor the response's Cache-Control header; Google's directory sets one. Cache the directory in memory or Redis, refresh on expiry, delete any keys that disappear across refreshes (key rotation). Pull out the key whose JWK thumbprint matches keyid.

Step four. Verify the signature against the components named in Signature-Input. If verification passes, you've cryptographically attested that the request was produced by the holder of that private key. If it fails, treat the request as forged.

Web Bot Auth verification flow diagram: request arrives, check Signature-Agent, fetch and cache directory, verify signature, allow or deny — The four-step verification flow. Most of the cost is the directory cache. The cryptographic verify is microseconds.

The directory cache is the piece I see operators get wrong. Treat the Cache-Control header as authoritative. Don't over-cache (a stale directory accepts revoked keys) and don't under-cache (refetching per request adds latency and abuses the directory endpoint). On a fetch failure with an expired cache, fall back to the unsigned path. Don't block on a transient miss.

What changes for your AI-bot-block rules

Here's the central operator question. You wrote rules. The protocol changed under them. What do you do?

The good news first. Rules that block by UA contains GPTBot, ClaudeBot, or PerplexityBot are unaffected. The request still arrives with a recognizable UA, and a spoofed GPTBot was always pretending to be the bot you wanted to block. If your rules also block the legitimate signed GPTBot, that's a policy choice you already made.

The less-good news. Rules that allow by UA contains Googlebot are now under-specified. A spoofer with a Googlebot UA still passes them. The fix isn't to rewrite the rule overnight (the signed share is too small for that), but to add a parallel rule path: verify the signature on signed Google traffic, treat the unsigned remainder with reverse-DNS verification. Cloudflare's verified-bots team summarized the gap:

"Existing identification methods rely on a combination of IP address range (which may be shared by other services, or change over time) and user-agent header (easily spoofable). These have limitations and deficiencies." — Cloudflare verified-bots team, July 2025

The two-stack model is the right mental picture. One ruleset handles signed traffic, verifies the signature, checks the keyid against a trusted set, validates the expires, and routes based on the resulting verified identity. The other ruleset handles unsigned traffic, doing the legacy reverse-DNS plus UA plus IP work exactly as it does today. Don't delete the legacy rules. As of mid-2026, most of your real Google traffic still flows through them.

Verifying signatures at your origin (when you're not on Cloudflare)

If you sit behind Cloudflare, the work is small. Cloudflare validates signatures at the edge and exposes the result via cf.verified_bot_category in WAF Custom Rules and Transform Rules. Your rule becomes "if cf.verified_bot_category is the category you want, route accordingly," and the cryptography is somebody else's problem.

If you don't sit behind a verifying CDN, you do the work at your origin. The shape is a small middleware in front of nginx or your application server. It intercepts requests carrying a Signature-Agent header, fetches the bot's .well-known directory on first sight (cached after), verifies per RFC 9421, and sets an internal X-Verified-Bot trust header that your downstream rules can read.

The Cloudflare research team open-sourced the verifying pieces at cloudflareresearch/web-bot-auth. The Rust crate and TypeScript npm package (both named web-bot-auth) carry the verification logic, and the repo ships a Caddy plugin and Cloudflare Worker examples. None of these are audited (the README says so), but the verification surface is small, and the alternative is implementing RFC 9421 yourself.

Decision tree for handling a signed Web Bot Auth request: branches for unsigned, signature invalid, expires passed, keyid unknown, and valid signature — The five end-states for an incoming request. Only the rightmost branch (valid signature, fresh keyid, within window) earns the verified-bot trust label.

Pragmatic call: on Cloudflare, the edge path is obvious. Off Cloudflare, install the middleware, point it at the agents you want to verify (Google's directory today, plus whichever others matter), and read its trust header in your existing rules. Either way, don't embed signature verification in business code. Keep it in the front-of-house tier where it can be audited and replaced.

The reverse-DNS fallback isn't going away

The reason I keep returning to "don't delete the legacy rules" is that the signed share is still small. In May 2026 the Googlebot indexing crawler is not signing requests. Only the AI-browsing Google-Agent signs. For most sites the AI-browsing share is a single-digit slice of total Google traffic. The indexing share that ships your organic visibility is unsigned today and likely through 2027.

Google says so plainly. The same crawler documentation that introduces Web Bot Auth tells operators to "continue relying on IP addresses, reverse DNS, and user-agent strings" alongside the new protocol. That isn't a hedge. It's the operating model. Web Bot Auth is one verification rail; the legacy stack is the other. They run in parallel, and in 2026 the legacy stack carries more weight.

The audit cadence. Once a quarter, pull a sample of Google-claiming traffic from access logs, bucket it by signed and unsigned, and compute the signed share. When it crosses 30-40%, the signature-verification path starts to dominate. When it crosses 70%, the unsigned Googlebot UA rule deserves a real hardness review; the spoofers will be most visible in that minority bucket. Before any of those thresholds, keep both rails running and treat the cryptographic rail as additive.

One counter-anti-pattern. Don't write a rule today that blocks unsigned Googlebot UA traffic. You'll de-index yourself within a crawl cycle.

What to actually do this quarter

Four-item checklist. None of these requires a vendor.

First, inventory your existing bot rules. Tag each one by what it actually verifies: UA, reverse DNS, IP range, or signature. Most ruleset audits surface duplicate or stale rules. Clean those up before adding new ones.

Second, add a signature-verification path. On Cloudflare, enable the verified-bots edge validation and add one rule branching on cf.verified_bot_category. At your own origin, install the WBA middleware, point it at agent.bot.goog (and any other agent directories that matter), and surface a trust header your existing rules can read.

Third, keep the reverse-DNS path for the much larger pool of unsigned Google-claiming traffic. Don't tighten it. Don't replace it. Run it alongside the signature path.

Fourth, schedule the quarterly audit: signed share of Google-claiming traffic, signed share of AI-agent traffic, and percentage of spoofers caught by signature verification that the legacy rules missed. The numbers move slowly through 2026 and faster through 2027. Your rule structure should move with them.

If your team also runs broader AI-crawler policy, the AI crawler playbook and the Cloudflare AI-bot-block disable piece are companion reads on the allow/block side. This one is the identity side.

What this does NOT solve

Web Bot Auth is identity, not authorization. The signature attests that a request was produced by a specific bot. It says nothing about whether that bot is allowed to read the URL. A verified, signed Google-Agent can still scrape your paid content if your rules let it through. The signature buys trust in the source claim. The policy still belongs to you.

It has no opinion on robots.txt either. A signed bot that ignores robots.txt is still a robots violator; signing doesn't grant additional access. If you want the signed AI-browsing agent to skip your paid section, you tell it so via robots and enforce it in your rules.

And it doesn't decide between AI-search routing and traditional search routing for you. Web Bot Auth tells you "this is really Google-Agent." Whether Google-Agent gets the same content treatment as Googlebot, or different content, is a policy decision you make. The piece on optimizing for Perplexity, ChatGPT search, and Google AI Mode covers that routing side.

The honest assessment: this is one rail in a three-rail stack. Signature for identity, reverse DNS for legacy verification, robots policy for authorization. The new rail hardens the first one. The operators I see succeed in 2026 treat all three as load-bearing.

The audit cadence as adoption grows

The line to track through 2026 and into 2027 is the signed share of your Google-claiming traffic. Today, for most sites, it's single digits. When it crosses 30-40%, the verification path starts to pull weight in your decisions. When it crosses 70%, the unsigned Googlebot UA rule deserves a real review.

Re-run the inventory every two quarters and re-read Google's crawler documentation when you do; it's been moving. The directory shape and the covered-components list are the two pieces of the spec most likely to shift. Twice a year is enough to stay ahead.

For the foundational Googlebot context, our explainer on what Googlebot is is the right starting point. For the agent-traffic side, how to build an agent-friendly website walks the integration picture.

FAQ

Is Googlebot itself signing requests yet? Not as of mid-2026. Google's current Web Bot Auth rollout covers the Google-Agent AI-browsing agent. Googlebot, the indexing crawler that drives traditional organic traffic, still authenticates via reverse DNS and the documented Googlebot IP ranges. Plan your rules to treat them separately.

Does Web Bot Auth replace robots.txt? No. They answer different questions. Web Bot Auth attests "this request is really from Google." Robots.txt declares "this URL is or isn't allowed for crawling." Both still apply, and a signed bot that ignores robots is still a robots violator.

What signature algorithm does Web Bot Auth use? RFC 9421 supports several. Cloudflare's documented examples and Google's published directory both use ed25519 (EdDSA over Curve25519). Your verifier needs an ed25519 implementation; that's a single library call in most stacks (Go, Rust, Node, Python all have it).

What happens if Google's public key directory is unreachable? You cache the directory per its Cache-Control header. If your cache is fresh, verification continues against the cached keys. If the cache is expired and the fetch fails, fall back to the unsigned-traffic path (reverse DNS, UA, IP). Don't block on a transient cache miss; that's how you accidentally de-index yourself when Google's directory has a hiccup.

Should I drop reverse-DNS verification for Googlebot? Not yet, and probably not in 2026. The signed share is too small. Reverse DNS is your real defense against UA spoofers claiming to be Googlebot, because Googlebot proper is still unsigned. Re-evaluate quarterly as the signed share grows. The right time to tighten the unsigned path is when it's the minority of your real Google traffic, not when it's the majority.

Our powerful suite of automation tools for SEO

Learn, discover, and get inspired by our content

What Web Bot Auth Means If You're Already Blocking AI Crawlers