seojuice

Free Robots.txt Generator Tool

Generate optimized robots.txt files for your website on WordPress, Shopify, Medium, Ghost, Joomla, Drupal, or any other platform. Control search engine crawling and improve your SEO.

Start Generating Now

Robots.txt Generator

How Our Robots.txt Generator Works

1. Select Platform

Choose your website platform or select custom for a tailored robots.txt file.

2. Choose Robots.txt Type

Select between an open or closed robots.txt file based on your crawling preferences.

3. Generate Robots.txt

Our tool will generate the appropriate robots.txt file based on your selections, ready to be used on your website.

Benefits of Using Our Robots.txt Generator

Customizable Crawling Rules

Block specific directories, allow specific bots, set sitemap paths — all without editing server config files or touching code.

Platform-Specific Optimization

Pre-built templates for WordPress, Shopify, Next.js, and static sites. Pick your platform and get a working robots.txt in under a minute.

Search Engine Control

Tell Googlebot to skip your staging pages, admin panels, and duplicate content filters — so it spends crawl budget on pages that actually rank.

SEO Improvement

A clean robots.txt prevents crawl waste. Pages that get crawled faster get indexed faster. Pages that get indexed faster start ranking sooner.

Time-Saving Solution

Writing robots.txt by hand means Googling the syntax every time. This generates valid syntax with the directives you actually need.

Best Practices Compliance

Follows Google's current robots.txt specification and the RFC 9309 standard. No outdated directives, no deprecated syntax.

What is Robots.txt?

Example robots.txt file showing User-agent, Disallow, and Sitemap directives
A typical robots.txt file with common directives for WordPress sites.

Robots.txt is a plain text file that tells search engine crawlers which URLs on your site they're allowed to access. Every time Googlebot, Bingbot, or any other crawler visits your site, the first thing it checks is yourdomain.com/robots.txt. If the file exists, the crawler reads the rules before crawling anything else. For server-level access control (redirects, password protection, IP blocking), you'd use an .htaccess file instead — robots.txt only handles crawler instructions.

The file always lives at the root of your domain. Not in a subfolder, not with a different name. Google will only look at https://example.com/robots.txt — nothing else counts.

Core Directives

Directive What It Does Example
User-agent Specifies which crawler the rules apply to. Use * for all crawlers. User-agent: *
Disallow Blocks crawling of a path. An empty value means nothing is blocked. Disallow: /admin/
Allow Overrides a Disallow for a specific path. Useful for allowing a subfolder inside a blocked folder. Allow: /admin/public/
Sitemap Points crawlers to your XML sitemap. Not technically part of the robots protocol, but universally supported. Sitemap: https://example.com/sitemap.xml

Robots.txt vs Noindex — They're Not the Same

This trips people up constantly. Robots.txt blocks crawling — it stops the bot from visiting the page. A noindex meta tag blocks indexing — it tells the search engine not to show the page in results. Here's the catch: if you block a page via robots.txt, Google can't see the noindex tag on that page (because it never crawls it). The page can still appear in search results with a "No information is available for this page" snippet if other sites link to it. If you want a page out of Google entirely, use noindex and allow crawling so the bot can actually read the tag.

Common Robots.txt Mistakes

  • Blocking CSS and JS files. Google needs to render your page to understand it. If you block /wp-content/themes/ or /wp-includes/, Googlebot sees a blank page. That kills your rankings.
  • Blocking the entire site by accident. A single Disallow: / under User-agent: * blocks everything. This happens more often than you'd think, especially during staging-to-production migrations.
  • Forgetting the Sitemap directive. Robots.txt is the first file crawlers check. Including your sitemap URL here is the fastest way to make sure crawlers find all your pages, especially on new sites.
  • Using robots.txt to hide pages from search results. As covered above, blocking crawling doesn't prevent indexing. Use noindex instead.
  • Not testing after changes. Always verify your robots.txt in Google Search Console's URL Inspection tool or the robots.txt report. A typo in a path can block your highest-traffic pages.
  • Ignoring AI crawlers. Bots like GPTBot, ClaudeBot, and PerplexityBot now crawl sites to train models or power AI answers. If you want to control AI bot access separately from search engines, you can add specific User-agent rules — or use our AI Crawler Inspector to see which AI bots are already hitting your site.

Controlling AI Crawlers with Robots.txt

Search engines are no longer the only bots reading your robots.txt. AI companies crawl the web to train models and to answer user questions in real time — and most of them respect robots.txt rules. The catch is that "AI crawler" covers two very different jobs: training crawlers collect content to train future models, while AI search crawlers fetch your pages to cite them in answers (with a link back to you). Blocking the first costs you nothing visible. Blocking the second removes you from AI answers — and from the referral traffic they send.

The AI User-Agents That Matter

User-agent Operator What it does If you block it
GPTBot OpenAI Collects content for model training No visible traffic loss today
OAI-SearchBot OpenAI Indexes pages for ChatGPT search answers You disappear from ChatGPT search citations
ChatGPT-User OpenAI Fetches a page live when a ChatGPT user asks about it ChatGPT can't open or summarize your pages on request
ClaudeBot Anthropic Collects content for model training No visible traffic loss today
PerplexityBot Perplexity Indexes pages for Perplexity answers and citations You lose Perplexity citations and their referral clicks
Google-Extended Google Opt-out token for Gemini model training Nothing changes in Search — AI Overviews use regular Googlebot crawling
CCBot Common Crawl Builds the open web archive many training datasets start from Your content stays out of future Common Crawl snapshots
Applebot-Extended Apple Opt-out token for Apple Intelligence training Applebot still crawls for Siri and Spotlight

Recipe: block training, keep AI search visibility

The setup most sites actually want — your content doesn't feed model training, but ChatGPT and Perplexity can still cite you and send you visitors:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# AI search + on-demand fetchers stay allowed:
# OAI-SearchBot, ChatGPT-User, PerplexityBot

Recipe: block all AI bots

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

Two honest caveats. First, robots.txt is voluntary compliance — well-known operators respect it, but it's not enforcement; Cloudflare has publicly reported cases of AI crawlers fetching content despite robots.txt blocks. If you need a hard block, do it at the CDN or firewall level (we covered the trade-offs in our guide to Cloudflare's AI bot blocking). Second, before you block anything, check what's actually visiting: our free AI Crawler Inspector shows which AI bots already hit your site, and the AI Visibility Checker shows whether AI engines currently cite you — useful data points before you cut them off.

Robots.txt Examples by Platform

Every platform has different URL structures and admin paths. Here are working robots.txt examples you can use as a starting point for the most common setups.

WordPress

WordPress is the most common case. You want to block the admin area and internal search results while keeping everything else open. Don't block /wp-content/ — Google needs access to your theme CSS and JS to render pages properly.

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Sitemap: https://example.com/sitemap_index.xml

Shopify

Shopify auto-generates a robots.txt that covers most cases. Since mid-2021, you can customize it via a robots.txt.liquid template in your theme. The default blocks checkout, cart, and internal search — which is usually what you want.

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /checkout
Disallow: /search
Sitemap: https://example.com/sitemap.xml

Next.js / Static Sites

Static sites and Next.js apps usually have minimal paths to block. If you're using Next.js, place the file in your public/ directory. For most static sites, an open robots.txt with just a sitemap reference is enough.

User-agent: *
Disallow: /api/
Disallow: /_next/
Allow: /_next/static/
Sitemap: https://example.com/sitemap.xml

These are starting points. Use the generator above to create a robots.txt tailored to your specific platform and crawling preferences.

Frequently Asked Questions

A robots.txt file is a text file that tells search engine crawlers which pages or files the crawler can or can't request from your site. It's used to manage website traffic and avoid overloading your site with requests.

Our tool allows you to select your website platform, choose crawling preferences, and optionally block specific search engines. It then generates a robots.txt file based on your selections, following best practices for each platform.

While not mandatory, a robots.txt file is highly recommended for most websites. It helps you control how search engines crawl your site, potentially improving your SEO and server performance.

After generating the file, copy its contents and create a new file named "robots.txt" in the root directory of your website. For most websites, this would be accessible at yourdomain.com/robots.txt.

No. Robots.txt blocks crawling, not indexing. If other websites link to a page you've blocked in robots.txt, Google can still show that URL in search results — it just won't have a snippet because Google never crawled the content. To actually remove a page from search results, use a noindex meta tag on the page itself, and make sure robots.txt allows crawling so Google can read the tag.

Review it whenever you make structural changes to your site — adding new sections, switching CMS platforms, launching a staging environment, or noticing crawl budget issues in Google Search Console. For most sites, checking it once a quarter is enough. If you're running a large site with thousands of pages, you'll want to monitor crawl stats more frequently and adjust your robots.txt to prioritize important content.

Search engines will crawl everything they can find on your site. For small sites, this is usually fine — there's nothing wrong with full crawling. But for larger sites, you're wasting crawl budget on admin pages, internal search results, and other low-value URLs. You're also missing an easy opportunity to point crawlers to your sitemap. Even a minimal robots.txt with just a Sitemap directive is better than nothing.

It depends on what you're blocking. Training crawlers (GPTBot, ClaudeBot, CCBot) collect content to train future AI models — blocking them costs you nothing visible today. AI search crawlers (OAI-SearchBot, PerplexityBot) index your pages so AI assistants can cite you with a link — blocking them removes you from AI answers and the referral traffic they send. Most sites that want control block the training bots and keep the search bots allowed. See the AI crawler section above for copy-paste recipes.

View all →