Join our community of websites already using SEOJuice to automate the boring SEO work.
See what our customers say and learn about sustainable SEO that drives long-term growth.
Explore the blog →TL;DR. Googlebot is not one bot but a family of crawlers, and Googlebot Smartphone has driven almost everything since mobile-first indexing became default in 2023. Its job runs in three phases (crawl, render, index) that can be hours or days apart, and the render phase is where most "Googlebot can't see my page" complaints actually live. Across SEOJuice support tickets between mid-2024 and early 2026, roughly 6 in 10 indexing escalations turned out to be render-phase issues; about 2 in 10 were crawl-phase problems (the rest were noindex tags or robots.txt mistakes). This guide covers the bot family, the three-phase pipeline, verification, what AI summaries get wrong, and how Googlebot compares to the AI crawlers in 2026.
Updated May 2026. Refreshed with SEOJuice ticket-mix data, a Cloudflare bot-fight anecdote, an AI-Overview drift section, and two callbacks to our Web Bot Auth (RFC 9421) writeup from yesterday.
I wrote this because I send the same explainer in Intercom three or four times a week. The customer says "Googlebot is blocked from my site." We open Search Console. The crawl phase is fine. The render phase fell over when a developer pushed a tab-panel refactor and didn't notice the article body now mounts after a click. I wanted one URL I could paste into the ticket.
Googlebot is the program Google uses to fetch web pages so they can be added to the Google index. When you publish a blog post and it shows up in search results, that journey starts with Googlebot requesting the URL, downloading the HTML, executing JavaScript, and passing the result to Google's indexing system.
"Googlebot" is sometimes used loosely to mean "any Google crawler." Strictly, it's the crawler that fetches pages for the main search index. Other Google crawlers exist (AdsBot for landing-page quality, Storebot for Shopping, Google-Extended for AI training opt-outs) and they follow different rules. Be specific when debugging.
Googlebot is distinct from a scraper. It reads robots.txt before each crawl, respects noindex, throttles when your server slows, and identifies itself so you can verify the hit. The new HTTP Message Signatures for crawlers proposal aims to make this verification cryptographic instead of DNS-based, but until adoption is universal the reverse-DNS check below is the operative test.
The bot you most need to think about is Googlebot Smartphone, which has crawled the mobile version of your site by default since Google completed mobile-first indexing in mid-2023. Desktop crawls still happen, but they are now the secondary case. The family tree, using Google's published user-agent reference:

| Crawler | User-agent token | Renders JS? | What it indexes | Share of crawl |
|---|---|---|---|---|
| Googlebot Smartphone | Googlebot/2.1 (Mobile) | Yes | Mobile pages for the primary index | ~80%+ post mobile-first |
| Googlebot Desktop | Googlebot/2.1 | Yes | Desktop variants for the same index | ~10-15% |
| Googlebot Image | Googlebot-Image/1.0 | No | Images for Google Images | Variable |
| Googlebot Video | Googlebot-Video/1.0 | No | Video files for Google Videos | Variable |
| Googlebot News | No distinct UA | Yes (uses Smartphone) | News-eligible pages | Site-dependent |
| Google-InspectionTool | Google-InspectionTool/1.0 | Yes | URL Inspection in Search Console | On-demand only |
| Google-Extended | Google-Extended | N/A | Read-only flag for Gemini training opt-out | No crawl |
The Chromium version inside Googlebot is not fixed. Google substitutes current stable Chrome at request time, and the renderer tracks public Chrome within a few weeks. (For years I told customers to treat the renderer as Chrome 41, which it actually was until the 2019 evergreen update. I kept giving outdated advice into 2021 before a Martin Splitt talk on Search Off the Record set me straight.) Identify Googlebot by verified IP, not UA string.
Googlebot's job splits into three distinct phases. They do not happen at the same time, and a delay or failure in any one can keep your page out of search results. Google's JavaScript SEO docs describe it cleanly: "Google processes JavaScript web apps in three main phases: 1. Crawling 2. Rendering 3. Indexing." If you cannot name which of these phases a problem lives in, you are guessing about the fix.

Googlebot picks a URL from its queue, sends an HTTP request, and receives the raw HTML. No JavaScript runs yet. The crawler reads the status code, the headers (caching, X-Robots-Tag, redirects), and the raw HTML body. URLs come from XML sitemaps, internal links from indexed pages, external links from other sites, and direct submissions via URL Inspection. Before any fetch, Googlebot reads robots.txt; if a URL is disallowed, the fetch never happens.
If a page needs JavaScript executed to show its content, Googlebot hands the URL to the Web Rendering Service (WRS), a headless Chromium that loads the page, runs the scripts, and produces the final rendered HTML. Google's docs: "Once Google's resources allow, a headless Chromium renders the page and executes the JavaScript."
"Once Google's resources allow" is doing a lot of work in that sentence. Rendering is expensive, so Google batches and queues it. Pages can sit in the render queue for seconds, hours, or in worst cases days. I have a 2024 screenshot of a 96-hour gap between crawl and render on a Next.js e-commerce site we audited. Official guidance: "The page may stay on this queue for a few seconds, but it can take longer than that." Queue prioritisation is undocumented.
Pure server-side rendered pages skip this queue entirely. That choice is the difference between "indexed within an hour" and "indexed two days later."
Once Googlebot has the final HTML (from the crawl, or from the WRS after rendering), the indexing system parses the document, extracts text, classifies content, evaluates ranking signals, and stores it in Google's index. The page becomes eligible for search results. Indexing isn't instant; it can take additional minutes or hours after rendering.
The crawl phase almost always succeeds; the page just doesn't render the way the developer expected. Six failure modes, in decreasing order of frequency on customer sites. Items 1 and 2 alone account for over half of the render-phase escalations we triage.

If clicking a "Show More" button is the only way to reveal a section, Googlebot won't see it. The WRS executes JavaScript but does not click buttons or scroll. Anything important should be in the DOM at load time, even if hidden via CSS the user can toggle. This is the single most common rendering failure, usually appearing in component libraries that lazy-mount tab panels, accordion bodies, and "load more" feeds.
Lazy-loaded images and content blocks need either native loading="lazy" or an Intersection Observer setup the WRS can resolve. Custom lazy-loading that waits for scroll events fails under WRS because there is no scroll. For components, ensure they render server-side or use a framework with proper SSR/hydration.
If a top-of-page script throws, downstream scripts may not run, leaving the rest of the page empty. The WRS sees whatever was rendered before the exception. Use URL Inspection's "View Tested Page" to see what Googlebot saw.
CAPTCHAs, Cloudflare bot fight mode set too aggressively, and naive geographic blocking can serve a 403 to Googlebot. Cloudflare's default bot-fight mode has bitten more customer sites than any other setting we debug; one B2B SaaS lost two-thirds of its indexed pages over a weekend in late 2024 after a security-team intern toggled it on, and recovery took three weeks. Whitelist verified Google IP ranges (googlebot.json) before any "block bots" feature.
If your robots.txt disallows /static/ or /assets/, the WRS can't fetch the JS and CSS bundles, and your page renders without styles or with broken JavaScript. Allow Googlebot to crawl static asset paths.
Googlebot does not authenticate, does not accept cookies meaningfully, and does not maintain session state. Anything behind a login wall will not be indexed. Use the indexing API or structured data for paywalled content if you need it discoverable.
The Googlebot user-agent string is trivially spoofable. Real Googlebot requests come from a published range of Google-owned IPs. The reliable verification is reverse DNS followed by forward DNS:

.googlebot.com or .google.com.Command line: host 66.249.66.1 then host crawl-66-249-66-1.googlebot.com. Automate this in your log pipeline; you'll be surprised how often "Googlebot crawl spike" turns out to be a scraper using the user-agent.
Reverse-DNS is the operative standard, but the cryptographic guarantee is "Google owns this IP range," not "this request was signed by Google." That gap is what the Web Bot Auth (RFC 9421) proposal addresses, by having crawlers sign requests with HTTP Message Signatures the origin can verify against a published key. Google has been an early implementer in 2026; the companion piece walks through the signing flow.
For sites under ~10,000 URLs, crawl budget is almost never a constraint. It becomes real on large sites with millions of URLs, faceted-search e-commerce, or sites wasting crawls on duplicates. Google publishes two influences: crawl rate (how fast your server can respond without errors) and crawl demand (how popular the URL is and how often it changes). On large sites, block faceted search URLs, internal site search results, paginated archives beyond page 5, session-ID parameters, and admin endpoints. Use robots.txt for crawl-time blocking and noindex for indexing-time blocking — they do different things. To speed up indexing of a new page, submit it via URL Inspection (this uses Google-InspectionTool, not Googlebot) and link it from a high-authority indexed page.
Ask ChatGPT, Claude, or Google's own AI Overviews "what is Googlebot" and you'll get a confident answer roughly 80% correct and 20% subtly wrong. The recurring drift across four engines:
Google-Extended is the separate token for Gemini training opt-out. Blocking Googlebot takes you out of search; blocking Google-Extended takes you out of AI training. Conflating them is a common mistake.Cross-reference any specific technical claim against developers.google.com/search/docs/crawling-indexing. That's the primary source; everything else is a second-hand summary.
Four checks, in order, until one returns a clear answer.
Check 1, URL Inspection in Search Console. Paste the URL. The tool tells you whether Google has crawled and indexed it and lets you "View Tested Page" to see the rendered HTML and a screenshot. If the rendered HTML is missing your content, the problem is in rendering. If the page returned a non-200, the problem is in crawling. This single check resolves roughly two-thirds of the tickets we run.
Check 2, curl with Googlebot's user-agent. Run curl -A "Mozilla/5.0 ... Googlebot/2.1 ..." https://yoursite.com/path. If your server returns different content for Googlebot than for a browser, cloaking is the cause.
Check 3, robots.txt and meta tag audit. Visit https://yoursite.com/robots.txt directly and confirm the URL isn't blocked. View page source and search for noindex. A surprising fraction of "won't index" cases are noindex tags left over from staging.
Check 4, server log analysis. Filter access logs for verified-Googlebot requests over the last 30 days. If the URL never appears, it's a discoverability problem. If it appears but returns 4xx/5xx, fix the error. SEOJuice runs verified-Googlebot log analysis on every connected site.

| Crawler | Operator | Renders JS? | Used for |
|---|---|---|---|
| Googlebot | Yes (recent Chromium) | Google search index | |
| Bingbot | Microsoft | Yes (Edge / Chromium) | Bing search index, Copilot grounding |
| GPTBot | OpenAI | Limited / no SPA support | ChatGPT training data |
| OAI-SearchBot | OpenAI | Limited | ChatGPT search retrieval |
| PerplexityBot | Perplexity | Limited | Perplexity answer engine |
| ClaudeBot | Anthropic | Limited | Claude training and retrieval |
| Google-Extended | N/A (read-only signal) | Opt-out flag for Gemini training |
Failure modes 1, 2, and 5 above — user-interaction gating, lazy-load signals, blocked static assets — hit AI crawlers harder than Googlebot because their renderers are weaker. The same checklist works on a Perplexity-citation problem; the stakes are just lower for now.
If your content depends on client-side rendering, you may rank fine in Google but be invisible to ChatGPT, Perplexity, and Claude. The fix is the same: server-side render or pre-render. Our free AI visibility checker will tell you in under a minute whether the major AI engines can actually see your content. Separately, the AI crawlers each have their own robots.txt directives: User-agent: GPTBot blocks OpenAI training; User-agent: Google-Extended blocks Gemini training; User-agent: Googlebot still controls the regular search crawler, independently.
"The thing about Googlebot people most often miss is that crawling and rendering are not the same step. A URL can be crawled and still not have a rendered version of the content for hours." — Martin Splitt, Google Search Relations, paraphrased from his recurring point on Search Off the Record.
Googlebot is the web crawler Google uses to discover and download web pages so they can be indexed and shown in search results. It's a family of crawlers (Smartphone, Desktop, Image, Video, News). Most discussion refers to Googlebot Smartphone, the primary crawler since mobile-first indexing completed in 2023.
Yes. The Web Rendering Service is a headless Chromium that tracks recent stable Chrome. The catch is the rendering queue: even when JS rendering succeeds, it can happen seconds, hours, or days after the initial crawl. Server-side rendered pages skip this queue.
Reverse DNS the IP. Real Googlebot hits resolve to hostnames ending in .googlebot.com or .google.com. Then forward-DNS that hostname; it should resolve back to the same IP. The user-agent header alone is not proof.
Yes. User-agent: Googlebot + Disallow: / in robots.txt blocks crawling and therefore indexing. For finer control, use noindex tags or block specific paths. Don't block CSS and JS bundles; the rendering service needs them.
No. Separate crawlers run by different companies. Googlebot indexes for Google Search; GPTBot collects ChatGPT training data; PerplexityBot retrieves for Perplexity's answer engine. Each has its own UA string and its own robots.txt rules.
Common causes in order: the page isn't linked from any indexed URL, returns a non-200 status, has a noindex tag, is blocked by robots.txt, or depends on client-side JS the rendering service hasn't processed yet. Use URL Inspection to identify which.
If your content is JavaScript-dependent and your only health check is "does it rank in Google Search," you're optimising for the strongest renderer in the crawler ecosystem and ignoring everything else. The AI crawlers are weaker, and their share of referral traffic is growing every quarter we measure it. Server-side rendering is no longer a Google optimisation. It is an AI-visibility prerequisite. The render-queue prioritisation logic remains opaque — the docs say resource-dependent, but variance across near-identical sites suggests something else is in the mix.
Related reading:
no credit card required