Screenshot Generator
Capture website screenshots
Join our community of websites already using SEOJuice to automate the boring SEO work.
See what our customers say and learn about sustainable SEO that drives long-term growth.
Explore the blog →Generate optimized robots.txt files for your website on WordPress, Shopify, Medium, Ghost, Joomla, Drupal, or any other platform. Control search engine crawling and improve your SEO.
Start Generating NowChoose your website platform or select custom for a tailored robots.txt file.
Select between an open or closed robots.txt file based on your crawling preferences.
Our tool will generate the appropriate robots.txt file based on your selections, ready to be used on your website.
Block specific directories, allow specific bots, set sitemap paths — all without editing server config files or touching code.
Pre-built templates for WordPress, Shopify, Next.js, and static sites. Pick your platform and get a working robots.txt in under a minute.
Tell Googlebot to skip your staging pages, admin panels, and duplicate content filters — so it spends crawl budget on pages that actually rank.
A clean robots.txt prevents crawl waste. Pages that get crawled faster get indexed faster. Pages that get indexed faster start ranking sooner.
Writing robots.txt by hand means Googling the syntax every time. This generates valid syntax with the directives you actually need.
Follows Google's current robots.txt specification and the RFC 9309 standard. No outdated directives, no deprecated syntax.
Robots.txt is a plain text file that tells search engine crawlers which URLs on your site they're allowed to access. Every time Googlebot, Bingbot, or any other crawler visits your site, the first thing it checks is yourdomain.com/robots.txt. If the file exists, the crawler reads the rules before crawling anything else. For server-level access control (redirects, password protection, IP blocking), you'd use an .htaccess file instead — robots.txt only handles crawler instructions.
The file always lives at the root of your domain. Not in a subfolder, not with a different name. Google will only look at https://example.com/robots.txt — nothing else counts.
| Directive | What It Does | Example |
|---|---|---|
User-agent
|
Specifies which crawler the rules apply to. Use * for all crawlers.
|
User-agent: *
|
Disallow
|
Blocks crawling of a path. An empty value means nothing is blocked. |
Disallow: /admin/
|
Allow
|
Overrides a Disallow for a specific path. Useful for allowing a subfolder inside a blocked folder. |
Allow: /admin/public/
|
Sitemap
|
Points crawlers to your XML sitemap. Not technically part of the robots protocol, but universally supported. |
Sitemap: https://example.com/sitemap.xml
|
This trips people up constantly. Robots.txt blocks crawling — it stops the bot from visiting the page. A noindex meta tag blocks indexing — it tells the search engine not to show the page in results. Here's the catch: if you block a page via robots.txt, Google can't see the noindex tag on that page (because it never crawls it). The page can still appear in search results with a "No information is available for this page" snippet if other sites link to it. If you want a page out of Google entirely, use noindex and allow crawling so the bot can actually read the tag.
/wp-content/themes/ or /wp-includes/, Googlebot sees a blank page. That kills your rankings.
Disallow: / under User-agent: * blocks everything. This happens more often than you'd think, especially during staging-to-production migrations.
User-agent rules — or use our AI Crawler Inspector to see which AI bots are already hitting your site.
Search engines are no longer the only bots reading your robots.txt. AI companies crawl the web to train models and to answer user questions in real time — and most of them respect robots.txt rules. The catch is that "AI crawler" covers two very different jobs: training crawlers collect content to train future models, while AI search crawlers fetch your pages to cite them in answers (with a link back to you). Blocking the first costs you nothing visible. Blocking the second removes you from AI answers — and from the referral traffic they send.
| User-agent | Operator | What it does | If you block it |
|---|---|---|---|
GPTBot
|
OpenAI | Collects content for model training | No visible traffic loss today |
OAI-SearchBot
|
OpenAI | Indexes pages for ChatGPT search answers | You disappear from ChatGPT search citations |
ChatGPT-User
|
OpenAI | Fetches a page live when a ChatGPT user asks about it | ChatGPT can't open or summarize your pages on request |
ClaudeBot
|
Anthropic | Collects content for model training | No visible traffic loss today |
PerplexityBot
|
Perplexity | Indexes pages for Perplexity answers and citations | You lose Perplexity citations and their referral clicks |
Google-Extended
|
Opt-out token for Gemini model training | Nothing changes in Search — AI Overviews use regular Googlebot crawling | |
CCBot
|
Common Crawl | Builds the open web archive many training datasets start from | Your content stays out of future Common Crawl snapshots |
Applebot-Extended
|
Apple | Opt-out token for Apple Intelligence training | Applebot still crawls for Siri and Spotlight |
The setup most sites actually want — your content doesn't feed model training, but ChatGPT and Perplexity can still cite you and send you visitors:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Applebot-Extended
Disallow: /
# AI search + on-demand fetchers stay allowed:
# OAI-SearchBot, ChatGPT-User, PerplexityBot
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Applebot-Extended
Disallow: /
Two honest caveats. First, robots.txt is voluntary compliance — well-known operators respect it, but it's not enforcement; Cloudflare has publicly reported cases of AI crawlers fetching content despite robots.txt blocks. If you need a hard block, do it at the CDN or firewall level (we covered the trade-offs in our guide to Cloudflare's AI bot blocking). Second, before you block anything, check what's actually visiting: our free AI Crawler Inspector shows which AI bots already hit your site, and the AI Visibility Checker shows whether AI engines currently cite you — useful data points before you cut them off.
Every platform has different URL structures and admin paths. Here are working robots.txt examples you can use as a starting point for the most common setups.
WordPress is the most common case. You want to block the admin area and internal search results while keeping everything else open. Don't block /wp-content/ — Google needs access to your theme CSS and JS to render pages properly.
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Sitemap: https://example.com/sitemap_index.xml
Shopify auto-generates a robots.txt that covers most cases. Since mid-2021, you can customize it via a robots.txt.liquid template in your theme. The default blocks checkout, cart, and internal search — which is usually what you want.
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /checkout
Disallow: /search
Sitemap: https://example.com/sitemap.xml
Static sites and Next.js apps usually have minimal paths to block. If you're using Next.js, place the file in your public/ directory. For most static sites, an open robots.txt with just a sitemap reference is enough.
User-agent: *
Disallow: /api/
Disallow: /_next/
Allow: /_next/static/
Sitemap: https://example.com/sitemap.xml
These are starting points. Use the generator above to create a robots.txt tailored to your specific platform and crawling preferences.
A robots.txt file is a text file that tells search engine crawlers which pages or files the crawler can or can't request from your site. It's used to manage website traffic and avoid overloading your site with requests.
Our tool allows you to select your website platform, choose crawling preferences, and optionally block specific search engines. It then generates a robots.txt file based on your selections, following best practices for each platform.
While not mandatory, a robots.txt file is highly recommended for most websites. It helps you control how search engines crawl your site, potentially improving your SEO and server performance.
After generating the file, copy its contents and create a new file named "robots.txt" in the root directory of your website. For most websites, this would be accessible at yourdomain.com/robots.txt.
No. Robots.txt blocks crawling, not indexing. If other websites link to a page you've blocked in robots.txt, Google can still show that URL in search results — it just won't have a snippet because Google never crawled the content. To actually remove a page from search results, use a noindex meta tag on the page itself, and make sure robots.txt allows crawling so Google can read the tag.
Review it whenever you make structural changes to your site — adding new sections, switching CMS platforms, launching a staging environment, or noticing crawl budget issues in Google Search Console. For most sites, checking it once a quarter is enough. If you're running a large site with thousands of pages, you'll want to monitor crawl stats more frequently and adjust your robots.txt to prioritize important content.
Search engines will crawl everything they can find on your site. For small sites, this is usually fine — there's nothing wrong with full crawling. But for larger sites, you're wasting crawl budget on admin pages, internal search results, and other low-value URLs. You're also missing an easy opportunity to point crawlers to your sitemap. Even a minimal robots.txt with just a Sitemap directive is better than nothing.
It depends on what you're blocking. Training crawlers (GPTBot, ClaudeBot, CCBot) collect content to train future AI models — blocking them costs you nothing visible today. AI search crawlers (OAI-SearchBot, PerplexityBot) index your pages so AI assistants can cite you with a link — blocking them removes you from AI answers and the referral traffic they send. Most sites that want control block the training bots and keep the search bots allowed. See the AI crawler section above for copy-paste recipes.