Join our community of websites already using SEOJuice to automate the boring SEO work.
See what our customers say and learn about sustainable SEO that drives long-term growth.
Explore the blog →TL;DR: "AI-driven SEO" used to mean automating your workflow with AI tools. In 2026 it also means optimizing for an audience of AI readers — AI Overviews, ChatGPT, Perplexity, Claude — that read the page on behalf of the user. The 2023 foundations (crawlability, schema, internal linking, freshness, citation-worthy content) still hold. What's new is a second layer: how AI engines decide to cite you, how agent crawlers fetch your HTML, and what to do with llms.txt. This is the field guide for both layers, with a checklist mapped to the audits we run inside SEOJuice.
The 2023 version of this article was about the first half of the story: cleaning up your site so an LLM-based tool could analyze it without amplifying your structural mess. That advice still holds. We shipped the original because we'd seen automated internal-linking go badly on architecturally broken sites, and that risk hasn't gone away.
What changed between 2023 and 2026 is the second half. AI Overviews moved out of Search Labs into general results during 2024 and now reach somewhere north of 40% of US queries depending on whose tracker you read. Perplexity, ChatGPT search, and Claude's web search hit meaningful daily-active numbers. INP replaced FID as a Core Web Vital in March 2024, quietly invalidating half of every Lighthouse report written before then. Web Bot Auth (RFC 9421) gave operators a way to verify agent crawlers instead of trusting user-agent strings. Schema deprecations chipped away at FAQPage and HowTo rich results for non-niche sites. None of these are individually catastrophic; the aggregate is a shift in what a foundation is for.
Lily Ray summarized it cleanly in a substack piece earlier this year:
"Strong SEO performance is the very foundation of AI search visibility."
That's the framing this rewrite leans on. The new layer doesn't replace the old — it sits on top. If your site can't get indexed and can't rank in classic search, AI engines have no surface from which to retrieve and cite you. We'll start with signals that survived, walk the 2026-specific layer, and call out the demoted habits.

These are the signals that earned their place across two years of AIO rollout, three Google core updates, and the schema cleanup. Every one of them maps directly to something an AI engine reads when deciding whether to cite a page.
One: E-E-A-T proxies the model can actually see. E-E-A-T isn't a score in Google's index. It's the set of off-page and on-page signals that approximate experience, expertise, authoritativeness, and trustworthiness. Marie Haynes has been the clearest voice on what this means in practice:
"E-E-A-T is what others say about you, not necessarily links."
For an AI engine the proxies are: a real author page with a byline, a real organization schema, third-party mentions on the surfaces the model trained on (Reddit, YouTube, podcasts, trade publications), and consistency between what your site says about you and what the rest of the web says. In audits we see pages from a brand with a coherent author + organization graph cited at roughly two to three times the rate of pages from brands with no author surface, even when content quality is comparable.
Two: structured data that matches the page's actual job. John Mueller has been blunt about how much schema does on its own:
That's the right read for ranking. For AI citations the picture shifts. Schema doesn't move you up the SERP much, but it helps the retrieval layer (the live index AI engines query) disambiguate what your page is. A pricing page marked as Product with offer data gets pulled into "how much does X cost" answers more reliably than the same page marked as generic WebPage. The rule isn't "add more schema" — it's "use the schema type that matches the page's job." Validate with Google's Rich Results Test and the schema.org validator.
Three: internal linking that creates clusters, not just paths. AI engines retrieve in chunks. A page sitting alone, with no related pages linked from it and no related pages linking to it, looks like a one-off. A page that sits inside a tightly linked cluster looks like part of a body of work. We covered the data on this in internal linking statistics 2026; the short version is cluster-based linking still beats breadcrumb-only or footer-only linking by a wide margin.
Four: freshness signals that aren't a lie. Lily Ray's AI-citation research from this spring found a strong recency bias — roughly half of top-cited content in AI engines was less than 13 weeks old at query time. That cuts both ways. Refreshing a page with a real edit and a new dateModified helps. Bumping the date with no editorial change hurts, because both Google and the AI engines have gotten better at spotting cosmetic refreshes. Our content refresh strategy piece goes deeper on the difference.
Five: content that's citation-worthy because it's specific. AI engines prefer content they can quote. "Many SaaS teams see results" gives the model nothing to lift. "Three SaaS teams in our 2026 cohort reported a 31% increase in trial signups within 30 days of switching to Product schema" is a citable claim with a number and a date. The shift in 2026 isn't toward more content; it's toward more specific content, with named entities, real numbers, and source attribution.
These are the signals that didn't exist (or didn't matter) in 2023, and that we now treat as part of the foundation rather than as advanced tactics.

AI citation patterns. The AI engines don't cite uniformly. Reddit and YouTube dominate "best X" and "X vs Y" queries. Trade publications and brand pages dominate "what is X" and "how do I X" queries for technical topics. The retrieval layer leans on what's indexed and fresh; the parametric memory leans on what the model trained on (Reddit, YouTube transcripts, Common Crawl). Aleyda Solis put the technical-foundation half of this picture into one line:
Most operators get this wrong on the technical side first. Hosting rules, CDN settings, and robots.txt files written for Googlebot often silently block GPTBot, ClaudeBot, PerplexityBot, and the Gemini training fetch. Run our AI Crawler Inspector against your top pages before anything else. If the AI bots can't fetch you, no on-page signal helps.
Agent-friendly markup. Agent crawlers fetch raw HTML. They don't render JavaScript at any meaningful scale. If your pricing, your FAQ, your product copy, or your comparison content lives behind hydration, those crawlers see an empty shell. The fix isn't exotic — HTML-first delivery of the content you actually want cited is what works. Our agent-friendly website guide walks the routes-and-rendering decision; the validator inside agent-ready checks the most common gaps.
Web Bot Auth (RFC 9421) and bot verification. User-agent strings are trivial to spoof. Cloudflare, Fastly, and most major CDNs now support HTTP Message Signatures (RFC 9421) for verifying that a request claiming to be from GPTBot carries OpenAI's signed key. If you've been blocking AI bots at the WAF, allow only the verified ones. If you've been allowing everything, log the verified-versus-spoofed split. A 2026-only foundation; it didn't exist in 2023.
llms.txt, useful when there's something to point at. The proposed convention is a markdown file at /llms.txt that gives AI engines a curated map of your site. The honest read in 2026: adoption is uneven. Some engines lean on it, others ignore it. We treat it as cheap insurance for sites with a real documentation set, knowledge base, or product reference that benefits from being summarized for an LLM. It's not a substitute for HTML-first content or for the canonical sitemap. Generate one, ship it, move on.
The habits that earned their place on the old SEO checklist and have since lost most of their weight. Stop spending energy here. Redirect it to the foundations above.
Keyword density. Stop counting. Both Google and the AI engines moved to semantic representations years ago; hitting 1.5% density on the primary phrase doesn't help anyone read the page better. Write naturally, cover the entities and related concepts, let topic coverage do the work.
Exact-match anchor text. The post-Penguin reality holds. Exact-match internal anchors on every link to a cornerstone page look manipulative and dilute the signal. Vary the anchor text. Use the entity name where it's natural, a descriptive phrase where it isn't.
Alt-text stuffing. Alt text is for accessibility and for the small minority of AI engines that read it. Describe the image. Don't pack keywords. Our image-pipeline rule is one sentence of plain description per figure, full stop.
FAQPage and HowTo schema for non-niche sites. Google's 2023-2024 schema cleanup restricted both rich-result types to specific categories. Most sites no longer earn the rich result. The schema isn't wrong, but you stopped getting visible SERP value from it. Keep it for retrieval-layer clarity reasons; drop it from your top-of-mind SEO checklist.
"Write longer" as a quality heuristic. The old advice to push every post past 2,000 words to outrank a competitor doesn't survive AI Overviews. Long articles get summarized into a paragraph in the answer panel — the cited snippet is whatever the most specific 50 words on the page are. Specificity beats length. Aim for the right length for the question.
The table below maps the old SEO foundations to their 2026 equivalents. Most rows are continuity, not replacement. The principle holds; the surface or the threshold shifted.
| Foundation | 2023 form | 2026 form |
|---|---|---|
| Crawlability | Googlebot can fetch and render | Googlebot + GPTBot + ClaudeBot + PerplexityBot can all fetch raw HTML |
| Core Web Vitals | LCP, FID, CLS | LCP, INP, CLS (FID retired March 2024) |
| Structured data | Stack as much schema as possible | Use the schema type that matches the page's job; Product, Organization, Article earn their place; FAQPage and HowTo are situational |
| Internal linking | Cornerstone pages get most links | Cluster-based linking; every page sits inside a tightly linked group |
| Freshness | Refresh dates regularly | Refresh with real edits and a new dateModified; cosmetic bumps now get caught |
| E-E-A-T | Author bios on blog posts | Author entity graph + third-party mentions on Reddit, YouTube, podcasts |
| Content | Comprehensive, 2,000+ words | Specific, citable, with named entities and numbers; length serves the question |
| Bot management | robots.txt and user-agent rules | RFC 9421 verification at the CDN; allow verified AI bots, log the rest |
| Discovery files | XML sitemap | XML sitemap + (optional) llms.txt for sites with substantive doc / KB content |
Worth saying out loud, because the meta-question keeps coming up in client calls. AI Overviews are excellent at "what is X" and "how does Y work" queries. They're weaker at queries with stakes: comparison shopping, professional decisions, niche technical depth, anything where the user actually needs to evaluate sources rather than accept a summary.
The audit pattern: AIO often pulls from the wrong page for stakes-queries. Two reasons. First, the citation usually goes to whichever page best matches the retrieval query verbatim, not the most authoritative page on the topic. A high-authority page with its answer in a "buying guide" framing can lose to a low-authority page that happens to phrase its H2 exactly like the user's query. Second, AIO sometimes synthesizes across sources in a way that loses the nuance that made any one source useful — the answer reads correct, the citations are individually correct, but the synthesis is averaged across positions a careful writer would distinguish.

The implication isn't to panic. The foundations above — specific content, schema that disambiguates intent, internal links that create a cluster — make a page survive both readings: the careful reader and the verbatim-match retrieval. An operator losing citations to weaker competitors has a real signal that something at the foundation layer is off, not that AIO is broken.
The point of the foundations is that they're checkable. Below is the checklist we run on every site that comes through our audit pipeline, with the tool we use for each step. If you do nothing else, work this list in order.
Layer one, classic SEO foundations (still required).
dateModified against actual editorial changes; refresh the pages that need it, leave the ones that don't.
Layer two, 2026 AI-specific foundations.
Walk layer one first. If any check in layer one fails, fix it before touching layer two. The mistake operators make in 2026 is starting at layer two — chasing AI citations on a site that Googlebot still can't crawl properly.
Two layers. The classic SEO foundations (crawlability, schema, internal linking, freshness, citation-worthy content) still apply. The new 2026 layer adds AI bot crawlability, agent-friendly HTML-first delivery, Web Bot Auth verification, and optionally llms.txt. The classic layer is a prerequisite for the new one.
No. AI Overviews retrieve from the same web index that classic SEO optimizes for. As Lily Ray put it, strong SEO performance is the foundation of AI search visibility. If you're not in the index, AIO can't cite you.
Yes, but use the schema type that matches the page's actual job rather than stacking every type you can. Product on product pages, Article on editorial content, Organization site-wide. FAQPage and HowTo rich results have been restricted to specific categories, but the schema is still useful as a retrieval-layer disambiguation signal.
An HTTP Message Signatures standard that lets AI bots sign their requests cryptographically. CDNs verify the signature against the operator's public key, so you can distinguish verified GPTBot from spoofed user-agent strings. Most major CDNs added support during 2025.
Worth doing if you have a real documentation set, knowledge base, or product reference that benefits from being summarized for an LLM. Not a substitute for HTML-first content or the canonical sitemap. Adoption across AI engines is uneven, so treat it as cheap insurance rather than a critical signal.
Both have lost most of their weight. Both Google and the AI engines moved to semantic representations years ago. Write naturally, cover the entities, vary the anchor text. Don't audit for these.
Accurate for "what is X" and "how does Y work" queries. Weaker for queries with stakes: comparisons, professional decisions, niche technical depth. AIO citations often go to whichever page matches the query verbatim, not to the most authoritative page on the topic.
Long enough to answer the question specifically, short enough that the specific answer doesn't get buried. Comprehensive 3,000-word posts get summarized into a paragraph by AIO. The snippet that survives is whatever the most specific 50 words are. Specificity beats length.
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What are the foundational elements of AI-driven SEO in 2026?", "acceptedAnswer": { "@type": "Answer", "text": "Two layers. The classic SEO foundations (crawlability, schema, internal linking, freshness, citation-worthy content) still apply. The new 2026 layer adds AI bot crawlability, agent-friendly HTML-first delivery, Web Bot Auth verification, and optionally llms.txt. The classic layer is a prerequisite for the new one." } }, { "@type": "Question", "name": "Did AI Overviews replace traditional SEO?", "acceptedAnswer": { "@type": "Answer", "text": "No. AI Overviews retrieve from the same web index that classic SEO optimizes for. Strong SEO performance is the foundation of AI search visibility. If you're not in the index, AIO can't cite you." } }, { "@type": "Question", "name": "Is structured data still worth doing in 2026?", "acceptedAnswer": { "@type": "Answer", "text": "Yes, but use the schema type that matches the page's actual job rather than stacking every type. Product on product pages, Article on editorial content, Organization site-wide. FAQPage and HowTo rich results have been restricted to specific categories, but the schema is still useful as a retrieval-layer disambiguation signal." } }, { "@type": "Question", "name": "What is Web Bot Auth (RFC 9421)?", "acceptedAnswer": { "@type": "Answer", "text": "An HTTP Message Signatures standard that lets AI bots sign their requests cryptographically. CDNs verify the signature against the operator's public key, so you can distinguish verified GPTBot from spoofed user-agent strings. Most major CDNs added support during 2025." } }, { "@type": "Question", "name": "Should I add llms.txt to my site?", "acceptedAnswer": { "@type": "Answer", "text": "Worth doing if you have a real documentation set, knowledge base, or product reference that benefits from being summarized for an LLM. Not a substitute for HTML-first content or the canonical sitemap. Adoption across AI engines is uneven, so treat it as cheap insurance rather than a critical signal." } }, { "@type": "Question", "name": "What about keyword density and exact-match anchor text?", "acceptedAnswer": { "@type": "Answer", "text": "Both have lost most of their weight. Both Google and the AI engines moved to semantic representations years ago. Write naturally, cover the entities, vary the anchor text. Don't audit for these." } }, { "@type": "Question", "name": "Is AI Overviews accurate?", "acceptedAnswer": { "@type": "Answer", "text": "Accurate for 'what is X' and 'how does Y work' queries. Weaker for queries with stakes: comparisons, professional decisions, niche technical depth. AIO citations often go to whichever page matches the query verbatim, not to the most authoritative page on the topic." } }, { "@type": "Question", "name": "How long should a piece of content be in 2026?", "acceptedAnswer": { "@type": "Answer", "text": "Long enough to answer the question specifically, short enough that the specific answer doesn't get buried. Comprehensive 3,000-word posts get summarized into a paragraph by AIO. The snippet that survives is whatever the most specific 50 words are. Specificity beats length." } } ] } </script>no credit card required