seojuice

How to Build an Agent-Friendly Website (Without the llms.txt Hype)

Vadim Kravcenko
Vadim Kravcenko
May 04, 2026 · 13 min read

TL;DR: An agent-friendly website is not a website with an llms.txt file. It is a site with clear content, explicit crawl policy, stable structured data, and safe action paths that an AI agent can read, verify, and use without guessing.

Most people searching for “agent friendly website” are probably expecting a checklist: add llms.txt, clean up metadata, maybe expose an API. That is too small. Through mindnow, vadimkravcenko.com, and seojuice.com, the practical pattern is clearer: agents do not need a prettier sitemap. They need a contract.

SERP read: what the current top 3 say, and what they miss

Rank Result What it says What it misses
1 llmstxt.org A root-level Markdown file can tell LLMs which parts of a site matter and provide concise context. It treats the website mostly as reading material. Agents also need permission rules, page-level facts, form constraints, and action paths.
2 Cloudflare AI bot control Publishers need control over AI bots, crawlers, and scrapers, including blocking and future pay-per-crawl models. It frames the agent problem as defense first. That matters, but some agents are wanted visitors.
3 Anthropic MCP AI systems need a standard way to connect to tools and data sources instead of custom integrations for each source. MCP is powerful, but most marketing sites, SaaS docs, and commerce sites will not start with a full tool server.

Gap to fill

The SERP splits the topic into three separate answers: publish context, block bots, and expose tools. The missing layer is operational. How does a normal website become readable, governable, and action-safe for AI agents without rebuilding the whole product?

Core thesis

An agent-friendly website has three contracts:

  1. Content contract: what the agent should read and cite.
  2. Access contract: which crawlers, bots, and agents are allowed to fetch it.
  3. Action contract: what the agent may do on behalf of a user.

llms.txt belongs in the first contract. Cloudflare-style bot rules belong in the second. MCP, schema actions, APIs, and form design belong in the third. Mixing those up is how teams ship something that looks AI-ready and still fails the first serious agent task.

Stop thinking “LLM-friendly.” Think “agent-safe.”

An LLM-friendly website is passive. It helps a model summarize a page. An agent-friendly website is interactive. It helps software compare plans, fill a form, book a demo, download a file, or attempt a checkout for a real person.

In addition to just being a 'smart thing you talk to', it has all the 'interfaces' available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access.

Dario Amodei’s framing matters because it moves the problem from “can a model read us?” to “can a semi-autonomous system use us?” (semi-autonomous matters here). Anthropic made the same point in its computer use launch:

Instead of making specific tools to help Claude complete individual tasks, we're teaching it general computer skills—allowing it to use a wide range of standard tools and software programs designed for people.

If agents can use the same interfaces as humans, then hidden state, vague buttons, blocked content, anti-bot false positives, and JavaScript-only facts become agent UX problems too. The public pages on seojuice.com are static-first for that reason. The app can be dynamic. The content that should rank, be cited, and be fetched should not require a fragile client-side path.

The three contracts of an agent-friendly website

I think of an agent-friendly website as three agreements with machines. Not legal agreements. Operational ones. They tell an agent what to trust, what it may fetch, and what it may do.

Diagram of an agent-friendly website built from content, access, and action contracts
Source: SEOJuice agent-readiness model — when contracts disagree, agents guess

1. Content contract

The content contract tells agents what exists, what matters, what changed, and what should be cited. It includes HTML, canonical tags, structured data, XML sitemaps, RSS feeds, docs pages, and optionally llms.txt. The key word is consistency. A product page, schema block, sitemap entry, and llms.txt reference should not describe four different products.

2. Access contract

The access contract tells agents what is allowed. It includes robots.txt, WAF rules, AI bot allowlists, rate limits, authentication, and commercial terms. A site can welcome search crawlers, throttle unknown AI crawlers, block abusive scrapers, and still allow user-directed agents. One rule for every bot is usually lazy policy.

3. Action contract

The action contract tells agents what can be done safely. It includes labeled forms, idempotent actions, confirmation steps, APIs, MCP servers where justified, schema.org Actions, and human-readable fallback pages. The goal is not to let every agent click every button. The goal is to make the safe path obvious.

When these contracts disagree, the agent will guess. Guessing is the failure mode. I was wrong about this for years (I assumed better models would just cope). Better models cope longer, then fail in more expensive ways.

Build the content contract first: HTML, schema, sitemaps, and llms.txt

llms.txt deserves a fair reading. Jeremy Howard’s argument is strong because site owners really do know which pages carry the best context:

Flow diagram showing how llms.txt, canonical HTML, structured data, and sitemaps help AI agents read a website
Source: SEOJuice content-contract guidance, based on the llms.txt specification and standard JSON-LD patterns
The problem this solves is that today, constructing the right context for LLMs based on a website is ambiguous — site authors know best, and can provide a list of content that an LLM should use.

The technical case from the llms.txt specification is context size, not SEO theater:

Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. While websites serve both human readers and LLMs, the latter benefit from more concise, expert-level information gathered in a single, accessible location.

John Mueller’s skepticism is also useful. He is warning against self-declared metadata that does not match the site:

To me, it's comparable to the keywords meta tag – this is what a site-owner claims their site is about … (Is the site really like that? well, you can check it. At that point, why not just check the site directly?)

My practical line: publish llms.txt, but do not make it the source of truth. The source of truth is still clean, crawlable, semantically consistent HTML. If your pricing page says “from $19,” your JSON-LD says “from $29,” and your llms.txt links to an old comparison page, the agent has no reason to trust your menu.

Minimum content contract checklist

  • Server-rendered or static HTML for public, indexable pages.
  • Stable title, meta description, canonical, and heading structure.
  • JSON-LD for Organization, Product, Article, FAQ, BreadcrumbList, SoftwareApplication, or the relevant type.
  • XML sitemap with canonical URLs only.
  • RSS feed or changelog feed for fresh content.
  • llms.txt that points to the best summaries, docs, pricing, policies, and canonical resources.
  • Plain-language pages for pricing, support, refunds, API limits, and contact paths.

Agents struggle when critical facts exist only in images, accordions that never render server-side, gated PDFs, route transitions, or tracking-heavy pages that change after load. In client work at mindnow, the common failure is rarely “no AI-specific file.” The existing HTML tells five different stories. If this is your problem, start with a technical SEO audit before inventing a new agent layer.

Write an llms.txt that helps agents instead of flattering yourself

A useful llms.txt is short. It names the company, explains what the site is about, and points agents toward canonical resources. It does not list every URL. It does not stuff the target keyword. It does not claim category leadership unless a real page proves it.

Section What to include
# Example Company Company name and one-sentence description.
## Core pages Product overview, pricing, docs, integrations, comparisons, or use cases.
## Policies Privacy, terms, refunds, security, support, and acceptable use.
## For agents Preferred citation page, support contact, deprecated areas, and crawl guidance.

Write entries like this: - [Pricing](https://example.com/pricing): Plans, limits, billing rules, and renewal terms. The description should explain why the page matters. Link to canonical pages, not campaign copies. If a page is outdated, say so. Agents are already bad at judging freshness; do not make them worse.

llms.txt is a menu, not the meal. The meal is still the website.

Decide which agents get in before your firewall decides for you

Cloudflare owns much of the access-contract conversation because publishers are tired of being scraped without consent or compensation. Matthew Prince put the economic problem plainly:

Policy matrix for allowing or blocking different AI crawlers and agents
Source: SEOJuice access-contract guidance, drawing on Cloudflare publisher control patterns
If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone – creators, consumers, tomorrow's AI founders, and the future of the web itself.

I agree with the control part. I also think many teams will over-block and call it strategy. The honest version is more granular. Different agents deserve different defaults.

Visitor type Default posture Why
Search crawlers Allow They still drive discovery and verification.
Known AI crawlers Case-by-case Some are useful; some extract value without return.
User-directed agents Usually allow with limits They act for a human trying to use your site.
Anonymous scrapers Block or throttle No user intent, no trust, no accountability.

A bot fetching your pricing page because a user asked, “compare these three tools,” is different from a crawler copying your docs site every night. Jan Curn from Apify makes the extraction reality explicit:

To build useful AI apps or agents, you need to provide the models with the right and up-to-date context, which often requires extracting data from the web.

If you do not offer a clean contract, useful tools may fall back to scraping. Abusive tools will scrape anyway. Your job is to make that distinction observable in logs, policy, and behavior.

Access contract checklist

  • Keep robots.txt explicit and boring.
  • Document AI crawler policy in plain language.
  • Use bot management to separate known crawlers, user agents, and abusive traffic.
  • Rate-limit by behavior, not only by user-agent string.
  • Return useful HTTP status codes.
  • Avoid blocking all headless browsers by default if customers may use agentic browsers.

Make actions safe: forms, checkout, booking, and account tasks

The action contract is where most “agent friendly website” advice goes thin. Reading is easy compared with doing. A demo request, checkout, cancellation, account upgrade, appointment booking, or support ticket has consequences.

Sequence diagram of a safe website action flow for AI agents
Source: SEOJuice action-contract guidance — agents inherit your worst-case UX
It provides a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol.

Anthropic’s MCP description gives the right category: action layer, not website replacement. MCP is useful when agents need trusted tool access, especially behind authentication. Skipping to MCP while your pricing page renders blank without JavaScript is theater.

Agent-safe form design

Use real labels, visible input constraints, specific errors, confirmation screens, and persistent URLs. A button that says “Submit” is worse than one that says “Request a demo.” A button that says “Cancel subscription after current billing period” is better than both. Agents should know what will happen before submission (humans appreciate this too).

Agent-safe commercial actions

For checkout, booking, cancellation, and plan changes, require confirmation. Show price, tax, renewal date, refund rules, and account identity before the final action. No dark patterns. They confuse agents and annoy humans. If a workflow relies on a mystery modal, hidden timer, or ambiguous upsell, fix the workflow before inviting agents into it.

When to expose MCP or an API

Use MCP when the agent needs authenticated access to private data or repeated tool calls. Examples include retrieving account usage, creating support tickets, checking inventory, scheduling appointments, or updating records. Anthropic’s broader problem statement is the real reason:

Even the most sophisticated models are constrained by their isolation from data—trapped behind information silos and legacy systems. Every new data source requires its own custom implementation, making truly connected systems difficult to scale.

The practical ladder is boring: HTML first, structured data second, API third, MCP when repeated tool use matters. If you are wrestling with JavaScript rendering or route-level metadata, fix your JavaScript SEO problems before building a Model Context Protocol server.

Use schema.org Actions where they fit, but do not pretend schema is magic

schema.org Actions are the older standard hiding in plain sight. Dan Brickley described the addition this way back in 2014:

This is perhaps the most interesting addition to schema.org since launch — providing a way of describing the capability to perform actions in the future.

The useful part is intent. SearchAction can tell machines how site search works. ReserveAction can describe a booking path. OrderAction can describe commerce. ContactAction can support sales or support paths. On a SaaS site, a simple SearchAction pointing to /search?q={query} is more helpful than a decorative schema graph that claims everything and enables nothing.

Uptake stalled because schema can describe an action, but it cannot make the workflow safe. If the search page returns empty results without JavaScript, the markup will not rescue it. If the booking form changes price after the final click, ReserveAction just gives the agent a cleaner path into a bad experience. Treat schema as a map of real functionality. For implementation details, pair this with a structured data guide, not a generator that dumps JSON-LD and leaves.

The agent-friendly website audit: 20 checks I would run first

Before adding anything AI-native to seojuice.com, I would run this audit. The boring checks catch most failures. A serious agent-friendly website starts by proving that public facts, crawl rules, and workflows agree.

Agent-friendly website audit checklist grouped by content, access, and action
Source: SEOJuice agent-friendly audit checklist

Content checks

  • Can a text browser understand the page?
  • Does the canonical page contain the main facts?
  • Are key claims supported on-page?
  • Is structured data valid and consistent?
  • Does the sitemap include only canonical URLs?
  • Does llms.txt point to best pages, not every page?

Access checks

  • Does robots.txt match the business policy?
  • Are known AI crawlers allowed, blocked, or rate-limited intentionally?
  • Are user-directed agents accidentally blocked?
  • Do blocked requests return clear status codes?
  • Is there a public policy for AI crawlers?

Action checks

  • Are forms labeled in plain HTML?
  • Are errors specific?
  • Are prices, dates, and consequences visible before submit?
  • Are destructive actions confirmed?
  • Are important workflows possible without hidden UI state?
  • Is there an API or MCP path for authenticated repeated actions?
  • Can support be reached without a chatbot trap?
  • Are logs good enough to debug agent failures?

The annoying twentieth check is simple: ask a person outside your company to complete the workflow using only what is public. No Slack thread. No “oh, that page is old.” If they cannot do it, an agent will probably improvise.

A practical rollout plan for the next 30 days

Week 1: make public pages readable

Fix canonical HTML, headings, metadata, structured data, sitemap quality, and obvious rendering issues. Open your money pages with JavaScript disabled. If the main facts disappear, repair that before writing another policy file.

Week 2: publish the content contract

Add llms.txt, clean docs and policy pages, and create a short “for agents and crawlers” policy page. Keep it plain. Agents need less brand poetry and more current facts.

Week 3: define access

Review robots.txt, WAF settings, AI crawler rules, rate limits, logs, and blocked user-agent patterns. Separate wanted automation from extraction you do not want (in 2026, this is no longer optional).

Week 4: harden actions

Audit forms, checkout, booking, cancellation, support, and account workflows. Add confirmations. Expose an API or MCP only where the use case deserves it. A public marketing site usually needs better HTML before it needs a tool server.

Agent-friendly is not a file. It is a website that can be read, trusted, and acted on without a private Slack thread explaining how it works.

FAQ

Is llms.txt required for an agent-friendly website?

No. It is useful, but not enough. Treat it as a guide to your best context, not as the source of truth.

Will Google use llms.txt?

No public commitment says it will. Build it for agents and LLM tools, not as a ranking hack. Google can already crawl your pages directly.

Should I block AI crawlers?

Sometimes. Block abusive crawlers, decide intentionally on known AI crawlers, and avoid blocking user-directed agents by accident. Your access contract should reflect business policy, not firewall panic.

Do I need MCP?

Only if agents need reliable access to data or tools, especially behind authentication. Public marketing pages usually need better HTML, structured data, and crawl policy before MCP.

What is the fastest first step?

Open your key page with JavaScript disabled or in a text-only view. If the main facts disappear, fix that before writing an llms.txt file.

Build the boring contract first

If you want help with the content contract, SEOJuice is where I would start: canonical HTML, schema, sitemap checks, llms.txt, and rendering parity all belong there. Fix that layer first; the access and action layers get easier once your public contract is consistent. For the bigger workflow, connect it to your AI SEO workflow instead of treating agent readiness as another one-off checklist.