seojuice

Foundational Elements for AI-Driven SEO

Lida Stepul
Lida Stepul
May 06, 2025 · 12 min read

TL;DR: "AI-driven SEO" used to mean automating your workflow with AI tools. In 2026 it also means optimizing for an audience of AI readers — AI Overviews, ChatGPT, Perplexity, Claude — that read the page on behalf of the user. The 2023 foundations (crawlability, schema, internal linking, freshness, citation-worthy content) still hold. What's new is a second layer: how AI engines decide to cite you, how agent crawlers fetch your HTML, and what to do with llms.txt. This is the field guide for both layers, with a checklist mapped to the audits we run inside SEOJuice.

What "AI-driven SEO" meant in 2023 versus what it means now

The 2023 version of this article was about the first half of the story: cleaning up your site so an LLM-based tool could analyze it without amplifying your structural mess. That advice still holds. We shipped the original because we'd seen automated internal-linking go badly on architecturally broken sites, and that risk hasn't gone away.

What changed between 2023 and 2026 is the second half. AI Overviews moved out of Search Labs into general results during 2024 and now reach somewhere north of 40% of US queries depending on whose tracker you read. Perplexity, ChatGPT search, and Claude's web search hit meaningful daily-active numbers. INP replaced FID as a Core Web Vital in March 2024, quietly invalidating half of every Lighthouse report written before then. Web Bot Auth (RFC 9421) gave operators a way to verify agent crawlers instead of trusting user-agent strings. Schema deprecations chipped away at FAQPage and HowTo rich results for non-niche sites. None of these are individually catastrophic; the aggregate is a shift in what a foundation is for.

Lily Ray summarized it cleanly in a substack piece earlier this year:

"Strong SEO performance is the very foundation of AI search visibility."

That's the framing this rewrite leans on. The new layer doesn't replace the old — it sits on top. If your site can't get indexed and can't rank in classic search, AI engines have no surface from which to retrieve and cite you. We'll start with signals that survived, walk the 2026-specific layer, and call out the demoted habits.

Two-layer diagram showing classic SEO foundations (crawlability, schema, internal linking, freshness, content) as the base, with new 2026 AI-specific layer (AI citation patterns, agent-friendly markup, llms.txt) sitting on top
The 2026 foundations stack: classic SEO signals on the base, AI-specific signals on top. The new layer doesn't replace the old one.

The five signals that survived the 2024-2026 turmoil

These are the signals that earned their place across two years of AIO rollout, three Google core updates, and the schema cleanup. Every one of them maps directly to something an AI engine reads when deciding whether to cite a page.

One: E-E-A-T proxies the model can actually see. E-E-A-T isn't a score in Google's index. It's the set of off-page and on-page signals that approximate experience, expertise, authoritativeness, and trustworthiness. Marie Haynes has been the clearest voice on what this means in practice:

"E-E-A-T is what others say about you, not necessarily links."

For an AI engine the proxies are: a real author page with a byline, a real organization schema, third-party mentions on the surfaces the model trained on (Reddit, YouTube, podcasts, trade publications), and consistency between what your site says about you and what the rest of the web says. In audits we see pages from a brand with a coherent author + organization graph cited at roughly two to three times the rate of pages from brands with no author surface, even when content quality is comparable.

Two: structured data that matches the page's actual job. John Mueller has been blunt about how much schema does on its own:

"How do you rank something purely from SD hints? It's an extremely light signal. If you're worried, make the content more obvious."

That's the right read for ranking. For AI citations the picture shifts. Schema doesn't move you up the SERP much, but it helps the retrieval layer (the live index AI engines query) disambiguate what your page is. A pricing page marked as Product with offer data gets pulled into "how much does X cost" answers more reliably than the same page marked as generic WebPage. The rule isn't "add more schema" — it's "use the schema type that matches the page's job." Validate with Google's Rich Results Test and the schema.org validator.

Three: internal linking that creates clusters, not just paths. AI engines retrieve in chunks. A page sitting alone, with no related pages linked from it and no related pages linking to it, looks like a one-off. A page that sits inside a tightly linked cluster looks like part of a body of work. We covered the data on this in internal linking statistics 2026; the short version is cluster-based linking still beats breadcrumb-only or footer-only linking by a wide margin.

Four: freshness signals that aren't a lie. Lily Ray's AI-citation research from this spring found a strong recency bias — roughly half of top-cited content in AI engines was less than 13 weeks old at query time. That cuts both ways. Refreshing a page with a real edit and a new dateModified helps. Bumping the date with no editorial change hurts, because both Google and the AI engines have gotten better at spotting cosmetic refreshes. Our content refresh strategy piece goes deeper on the difference.

Five: content that's citation-worthy because it's specific. AI engines prefer content they can quote. "Many SaaS teams see results" gives the model nothing to lift. "Three SaaS teams in our 2026 cohort reported a 31% increase in trial signups within 30 days of switching to Product schema" is a citable claim with a number and a date. The shift in 2026 isn't toward more content; it's toward more specific content, with named entities, real numbers, and source attribution.

The new 2026-specific foundations

These are the signals that didn't exist (or didn't matter) in 2023, and that we now treat as part of the foundation rather than as advanced tactics.

Diagram showing how a piece of content reaches an AI engine via two parallel paths: training-data ingestion with months of lag and durable signal, and retrieval indexing with days to weeks of lag and URL-specific signal
Two paths from content to AI citation: training data (slow, durable) and retrieval (fast, URL-specific). Both paths read the same foundations.

AI citation patterns. The AI engines don't cite uniformly. Reddit and YouTube dominate "best X" and "X vs Y" queries. Trade publications and brand pages dominate "what is X" and "how do I X" queries for technical topics. The retrieval layer leans on what's indexed and fresh; the parametric memory leans on what the model trained on (Reddit, YouTube transcripts, Common Crawl). Aleyda Solis put the technical-foundation half of this picture into one line:

"You need to allow AI crawlers to access your content. The rules you set might need to be different depending on your context."

Most operators get this wrong on the technical side first. Hosting rules, CDN settings, and robots.txt files written for Googlebot often silently block GPTBot, ClaudeBot, PerplexityBot, and the Gemini training fetch. Run our AI Crawler Inspector against your top pages before anything else. If the AI bots can't fetch you, no on-page signal helps.

Agent-friendly markup. Agent crawlers fetch raw HTML. They don't render JavaScript at any meaningful scale. If your pricing, your FAQ, your product copy, or your comparison content lives behind hydration, those crawlers see an empty shell. The fix isn't exotic — HTML-first delivery of the content you actually want cited is what works. Our agent-friendly website guide walks the routes-and-rendering decision; the validator inside agent-ready checks the most common gaps.

Web Bot Auth (RFC 9421) and bot verification. User-agent strings are trivial to spoof. Cloudflare, Fastly, and most major CDNs now support HTTP Message Signatures (RFC 9421) for verifying that a request claiming to be from GPTBot carries OpenAI's signed key. If you've been blocking AI bots at the WAF, allow only the verified ones. If you've been allowing everything, log the verified-versus-spoofed split. A 2026-only foundation; it didn't exist in 2023.

llms.txt, useful when there's something to point at. The proposed convention is a markdown file at /llms.txt that gives AI engines a curated map of your site. The honest read in 2026: adoption is uneven. Some engines lean on it, others ignore it. We treat it as cheap insurance for sites with a real documentation set, knowledge base, or product reference that benefits from being summarized for an LLM. It's not a substitute for HTML-first content or for the canonical sitemap. Generate one, ship it, move on.

What got demoted between 2023 and 2026

The habits that earned their place on the old SEO checklist and have since lost most of their weight. Stop spending energy here. Redirect it to the foundations above.

Keyword density. Stop counting. Both Google and the AI engines moved to semantic representations years ago; hitting 1.5% density on the primary phrase doesn't help anyone read the page better. Write naturally, cover the entities and related concepts, let topic coverage do the work.

Exact-match anchor text. The post-Penguin reality holds. Exact-match internal anchors on every link to a cornerstone page look manipulative and dilute the signal. Vary the anchor text. Use the entity name where it's natural, a descriptive phrase where it isn't.

Alt-text stuffing. Alt text is for accessibility and for the small minority of AI engines that read it. Describe the image. Don't pack keywords. Our image-pipeline rule is one sentence of plain description per figure, full stop.

FAQPage and HowTo schema for non-niche sites. Google's 2023-2024 schema cleanup restricted both rich-result types to specific categories. Most sites no longer earn the rich result. The schema isn't wrong, but you stopped getting visible SERP value from it. Keep it for retrieval-layer clarity reasons; drop it from your top-of-mind SEO checklist.

"Write longer" as a quality heuristic. The old advice to push every post past 2,000 words to outrank a competitor doesn't survive AI Overviews. Long articles get summarized into a paragraph in the answer panel — the cited snippet is whatever the most specific 50 words on the page are. Specificity beats length. Aim for the right length for the question.

2023 foundations versus 2026 foundations

The table below maps the old SEO foundations to their 2026 equivalents. Most rows are continuity, not replacement. The principle holds; the surface or the threshold shifted.

Foundation2023 form2026 form
CrawlabilityGooglebot can fetch and renderGooglebot + GPTBot + ClaudeBot + PerplexityBot can all fetch raw HTML
Core Web VitalsLCP, FID, CLSLCP, INP, CLS (FID retired March 2024)
Structured dataStack as much schema as possibleUse the schema type that matches the page's job; Product, Organization, Article earn their place; FAQPage and HowTo are situational
Internal linkingCornerstone pages get most linksCluster-based linking; every page sits inside a tightly linked group
FreshnessRefresh dates regularlyRefresh with real edits and a new dateModified; cosmetic bumps now get caught
E-E-A-TAuthor bios on blog postsAuthor entity graph + third-party mentions on Reddit, YouTube, podcasts
ContentComprehensive, 2,000+ wordsSpecific, citable, with named entities and numbers; length serves the question
Bot managementrobots.txt and user-agent rulesRFC 9421 verification at the CDN; allow verified AI bots, log the rest
Discovery filesXML sitemapXML sitemap + (optional) llms.txt for sites with substantive doc / KB content

What AI Overviews get wrong about AI SEO

Worth saying out loud, because the meta-question keeps coming up in client calls. AI Overviews are excellent at "what is X" and "how does Y work" queries. They're weaker at queries with stakes: comparison shopping, professional decisions, niche technical depth, anything where the user actually needs to evaluate sources rather than accept a summary.

The audit pattern: AIO often pulls from the wrong page for stakes-queries. Two reasons. First, the citation usually goes to whichever page best matches the retrieval query verbatim, not the most authoritative page on the topic. A high-authority page with its answer in a "buying guide" framing can lose to a low-authority page that happens to phrase its H2 exactly like the user's query. Second, AIO sometimes synthesizes across sources in a way that loses the nuance that made any one source useful — the answer reads correct, the citations are individually correct, but the synthesis is averaged across positions a careful writer would distinguish.

Side-by-side comparison showing an AI Overview answer block alongside the actual cited source pages, with notes on which citations matched the query verbatim versus which best answered the underlying question
The pattern in AIO citation audits: verbatim query match often beats topical authority. The cited page is rarely the best page on the topic.

The implication isn't to panic. The foundations above — specific content, schema that disambiguates intent, internal links that create a cluster — make a page survive both readings: the careful reader and the verbatim-match retrieval. An operator losing citations to weaker competitors has a real signal that something at the foundation layer is off, not that AIO is broken.

An audit checklist, mapped to the tools we built

The point of the foundations is that they're checkable. Below is the checklist we run on every site that comes through our audit pipeline, with the tool we use for each step. If you do nothing else, work this list in order.

Layer one, classic SEO foundations (still required).

  • Crawlability + indexability: free SEO audit for the surface scan; Google Search Console for the index-coverage truth.
  • Core Web Vitals with INP: Lighthouse score. The 2026 thresholds are LCP under 2.5s, INP under 200ms, CLS under 0.1.
  • Structured data validation: Google Rich Results Test plus our schema generator for the right type per page.
  • Internal linking and cluster health: internal link finder for orphans and gaps.
  • Freshness and content decay: track dateModified against actual editorial changes; refresh the pages that need it, leave the ones that don't.
Two-column checklist showing classic SEO foundations on the left and 2026 AI-specific foundations on the right, with the SEOJuice tool to run for each check
The audit in one picture. Walk the classic-foundations column first; the 2026 column only works once layer one passes.

Layer two, 2026 AI-specific foundations.

  • AI bot crawlability: AI Crawler Inspector to confirm GPTBot, ClaudeBot, PerplexityBot, and the Gemini fetcher can all reach your top pages, and that what they see matches what Googlebot sees.
  • AI citation share: AI Visibility Checker for a baseline read on how often you're cited in AIO, ChatGPT, Perplexity, and Claude for your target queries.
  • Agent-readiness: agent-ready for the routes-and-rendering audit on SPA-heavy sites.
  • llms.txt presence: llms.txt generator if your site has a real docs/KB layer.
  • Web Bot Auth verification: at the CDN. Cloudflare's docs on HTTP Message Signatures are the cleanest reference.

Walk layer one first. If any check in layer one fails, fix it before touching layer two. The mistake operators make in 2026 is starting at layer two — chasing AI citations on a site that Googlebot still can't crawl properly.

FAQ

What are the foundational elements of AI-driven SEO in 2026?

Two layers. The classic SEO foundations (crawlability, schema, internal linking, freshness, citation-worthy content) still apply. The new 2026 layer adds AI bot crawlability, agent-friendly HTML-first delivery, Web Bot Auth verification, and optionally llms.txt. The classic layer is a prerequisite for the new one.

Did AI Overviews replace traditional SEO?

No. AI Overviews retrieve from the same web index that classic SEO optimizes for. As Lily Ray put it, strong SEO performance is the foundation of AI search visibility. If you're not in the index, AIO can't cite you.

Is structured data still worth doing in 2026?

Yes, but use the schema type that matches the page's actual job rather than stacking every type you can. Product on product pages, Article on editorial content, Organization site-wide. FAQPage and HowTo rich results have been restricted to specific categories, but the schema is still useful as a retrieval-layer disambiguation signal.

What is Web Bot Auth (RFC 9421)?

An HTTP Message Signatures standard that lets AI bots sign their requests cryptographically. CDNs verify the signature against the operator's public key, so you can distinguish verified GPTBot from spoofed user-agent strings. Most major CDNs added support during 2025.

Should I add llms.txt to my site?

Worth doing if you have a real documentation set, knowledge base, or product reference that benefits from being summarized for an LLM. Not a substitute for HTML-first content or the canonical sitemap. Adoption across AI engines is uneven, so treat it as cheap insurance rather than a critical signal.

What about keyword density and exact-match anchor text?

Both have lost most of their weight. Both Google and the AI engines moved to semantic representations years ago. Write naturally, cover the entities, vary the anchor text. Don't audit for these.

Is AI Overviews accurate?

Accurate for "what is X" and "how does Y work" queries. Weaker for queries with stakes: comparisons, professional decisions, niche technical depth. AIO citations often go to whichever page matches the query verbatim, not to the most authoritative page on the topic.

How long should a piece of content be in 2026?

Long enough to answer the question specifically, short enough that the specific answer doesn't get buried. Comprehensive 3,000-word posts get summarized into a paragraph by AIO. The snippet that survives is whatever the most specific 50 words are. Specificity beats length.

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What are the foundational elements of AI-driven SEO in 2026?", "acceptedAnswer": { "@type": "Answer", "text": "Two layers. The classic SEO foundations (crawlability, schema, internal linking, freshness, citation-worthy content) still apply. The new 2026 layer adds AI bot crawlability, agent-friendly HTML-first delivery, Web Bot Auth verification, and optionally llms.txt. The classic layer is a prerequisite for the new one." } }, { "@type": "Question", "name": "Did AI Overviews replace traditional SEO?", "acceptedAnswer": { "@type": "Answer", "text": "No. AI Overviews retrieve from the same web index that classic SEO optimizes for. Strong SEO performance is the foundation of AI search visibility. If you're not in the index, AIO can't cite you." } }, { "@type": "Question", "name": "Is structured data still worth doing in 2026?", "acceptedAnswer": { "@type": "Answer", "text": "Yes, but use the schema type that matches the page's actual job rather than stacking every type. Product on product pages, Article on editorial content, Organization site-wide. FAQPage and HowTo rich results have been restricted to specific categories, but the schema is still useful as a retrieval-layer disambiguation signal." } }, { "@type": "Question", "name": "What is Web Bot Auth (RFC 9421)?", "acceptedAnswer": { "@type": "Answer", "text": "An HTTP Message Signatures standard that lets AI bots sign their requests cryptographically. CDNs verify the signature against the operator's public key, so you can distinguish verified GPTBot from spoofed user-agent strings. Most major CDNs added support during 2025." } }, { "@type": "Question", "name": "Should I add llms.txt to my site?", "acceptedAnswer": { "@type": "Answer", "text": "Worth doing if you have a real documentation set, knowledge base, or product reference that benefits from being summarized for an LLM. Not a substitute for HTML-first content or the canonical sitemap. Adoption across AI engines is uneven, so treat it as cheap insurance rather than a critical signal." } }, { "@type": "Question", "name": "What about keyword density and exact-match anchor text?", "acceptedAnswer": { "@type": "Answer", "text": "Both have lost most of their weight. Both Google and the AI engines moved to semantic representations years ago. Write naturally, cover the entities, vary the anchor text. Don't audit for these." } }, { "@type": "Question", "name": "Is AI Overviews accurate?", "acceptedAnswer": { "@type": "Answer", "text": "Accurate for 'what is X' and 'how does Y work' queries. Weaker for queries with stakes: comparisons, professional decisions, niche technical depth. AIO citations often go to whichever page matches the query verbatim, not to the most authoritative page on the topic." } }, { "@type": "Question", "name": "How long should a piece of content be in 2026?", "acceptedAnswer": { "@type": "Answer", "text": "Long enough to answer the question specifically, short enough that the specific answer doesn't get buried. Comprehensive 3,000-word posts get summarized into a paragraph by AIO. The snippet that survives is whatever the most specific 50 words are. Specificity beats length." } } ] } </script>

Keep reading