Join our community of websites already using SEOJuice to automate the boring SEO work.
See what our customers say and learn about sustainable SEO that drives long-term growth.
Explore the blog →TL;DR: AI tools cite pages they can parse, not pages that "rank." Most sites already produce content that's invisible to ChatGPT, Perplexity, and Bing Copilot because the structure is wrong, not because the writing is bad. Below: what changed for our own site after a 20-minute schema fix on a single page, the three stats that explain why this matters (Semrush, Profound, NYT), what to fix this month, and which "AI SEO" purchases I regret.
Refreshed with attributed citation-volume data (Semrush, Profound, NYT), an llms.txt section, and a documented methodology for the /data-page case study.
I wrote this because we instrument AI citations for SEOJuice customers, and the only way I trusted what I was telling them was to run the exercise on seojuice.io first. What I found surprised me. Our homepage was getting cited for "what is SEOJuice" but two of our most useful pages (the /data exploration page and the /tools landing page) were not showing up in Perplexity at all for queries they should have owned. The fix was less work than I expected. The framing took longer to get right.
[IMAGE: side-by-side Perplexity screenshot — left, "What is SEOJuice?" with citation; right, "best SEO data tools for SaaS" with no citation. Captured 2026-02-14.]
Three numbers, then a take.
1. Semrush analyzed 10,000 informational queries in late 2025 and found Google AI Overviews appearing in 88% of them, with 85.79% of cited URLs sitting in the top 10 organic results. Their full write-up is on the Semrush blog. The signal: if you don't already rank, you're statistically unlikely to be cited. But ranking alone isn't enough either, which is why the framing in this article matters.
2. The New York Times reported a 36.5% year-over-year decline in clicks from AI-influenced search results to news publishers in early 2026, summarised in SEO Sherpa's AI-search-engines breakdown. The traffic isn't gone. It's being absorbed into the answer box.
3. Profound's consensus-signal research (published Q4 2025) found that pages cited by ChatGPT, Perplexity, and Bing Copilot for the same query overlap 12% of the time, which means each engine is making partly independent decisions about who to cite. You don't get to optimize for one and assume the rest follow.
My take: the work isn't more SEO. It's restructuring three or four pages so the language model has something to lift. We did this on /data and the result was measurable, which I'll get to in a minute.
Most people still write content like they're trying to impress Google circa 2015. Jam in some keywords, fluff out the word count, slap on an H1, ship it. That earns you page 3 of a SERP nobody reads, and zero citations in ChatGPT or Perplexity.
These models don't rank websites. They retrieve, summarize, and occasionally cite, based on how clearly they can understand and repackage your content. The goal is no longer just "rank on Google." It is "be citable by any machine that reads your page."
| Feature | Google Search | ChatGPT / Perplexity / Bing AI |
|---|---|---|
| Indexing Method | Keyword + link-based | Embedding-based semantic matching (as of mid-2026; the architectures shift) |
| User Behavior | Clicks and skims | Reads the summary; clicks under 10% of the time, per the SparkToro 2026 AI-search panel |
| Page Selection | Algorithmic ranking | Retrieval over a smaller candidate set, then heuristic citation |
| Output Format | List of pages | Answers, citations, direct content |
| Best Content Style | SEO-optimized articles | Concise, structured, machine-parsable |
I tested this myself last quarter. I asked Perplexity, "What is SEOJuice?" and got a decent answer that cited our homepage. Then I asked, "What are good SEO tools for SaaS founders?" Two of our competitors got named. We did not. I rephrased the query four ways. Still nothing. So I asked Perplexity to describe each of the three brands it surfaced. Two answers were detailed. One was hand-wavy, said almost nothing useful. Theirs had marketing fluff and JavaScript-rendered content; their pages were prettier than ours but their structure was worse.
That test changed how I think about content. Our own page (the authoritative source) was invisible, not because the writing was wrong but because the page didn't make it easy for a model to extract a sentence.
"Citation is the new backlink. The currency is structured, extractable answers, not domain authority alone." — Aleyda Solis, in the November 2025 AI-search session at BrightonSEO.
This is the case study the rest of the article rests on, so let me be specific about the method.
Page: seojuice.io/data/. State as of 2026-01-20: a Tailwind grid of data widgets, no FAQ block, no schema beyond Article. Bing Webmaster Tools showed it indexed. Perplexity showed zero citations for any of the 12 queries I tracked.
What I changed (in 20 minutes, on 2026-01-22):
How I measured citation: manual prompt testing in Perplexity, 12 fixed queries run weekly. The 12 queries were a mix of brand ("what is the seojuice data page"), category ("SEO citation tracking dashboard"), and feature ("export Perplexity citation data to CSV"). I ran them in a private window with a clean cache, copy-pasted the results into a Google Sheet, and tagged any response that linked seojuice.io as a citation. Not a fancy tool. A sheet.
Result: by 2026-02-05 (a hair over two weeks later) three of the 12 queries pulled the /data page into the Perplexity source list. They were not queries I had targeted. Two of them I had not thought to write a page for at all. The AI needed the structural hint; the content was already there.
I'm not certain the schema alone caused this. The H1 rewrite and the definition sentence almost certainly helped, and I made all three changes the same morning. If I were running this again I'd separate them by a week. But the directional read holds: the structural fix was the cheap unlock.
A model doesn't crawl your site. It reads pages, often out of context, and builds an internal representation of what each page means. It prioritizes clarity, semantic structure, and quotable phrasing. Long intros: useless. Brand-speak: skipped. It hones in on definitions, summaries, FAQs, how-to lists, and clean section headers.
Try this: open Perplexity and ask, "What is (your company name)?" Check if it pulls your site. If not, that's the problem we're solving here.
If your page looks like a wall of text, with vague product descriptions, repetitive marketing phrases, and no structured data, AI tools won't cite you. Even if you are the best source. They cannot see your value unless you spell it out like you're explaining it to an intern with no context and a 3-second attention span.
If you read other AI-SEO guides this year, you've seen llms.txt mentioned. Here's the practical version.
llms.txt is a proposed file (think robots.txt for language models) that lives at /llms.txt and tells AI systems what content on your site is most important, in a flat-text format they can read without rendering JavaScript. The proposal came out of Answer.AI in late 2024 and has been picked up by Anthropic in their docs at docs.anthropic.com/llms.txt.
The honest assessment: I don't know if it does anything yet. The r/SEO and SEO Twitter chatter through 2025 was split. Some practitioners reported softer signals after adding it; others ran controlled tests and saw nothing. My read, as of May 2026, is that it's cheap (one flat file, takes about 10 minutes to author for a small site) and low-risk, so I add it. Don't expect a citation lift from llms.txt alone.
The crawler access piece is more concrete. PerplexityBot, ChatGPT-User, ClaudeBot, and GPTBot are the user-agents you'll see in your logs. Bing's BingPreview indirectly serves Perplexity and Copilot through Bing's index, so making sure Bing can crawl you is the highest-leverage action. Submit your sitemap to Bing Webmaster Tools. Confirm in your access logs that the bots above are not being blocked at the WAF or CDN layer. I have seen Cloudflare's "Block AI bots" toggle accidentally on at three client sites in 2025; that one switch made everything else in this article irrelevant.
Optimizing for ChatGPT, Perplexity, or Bing AI doesn't mean gaming a new algorithm. It means designing your content like it's being read by a machine with no time for nuance. So your page needs to make extraction easy.
Each URL should cover one distinct topic, no detours. ChatGPT prefers pulling clean, unambiguous answers. If your page covers five services, three tangents, and a founder story, it will skip right over it.
Good:
tp-link.com/us/support/faq/2680/("How to factory reset a TP-Link router") — single intent, FAQ-style page, ranks and gets cited because the URL, H1, and content all answer the same question.
Bad:yourdomain.com/supportwith a 20-item FAQ blob — buried answers, model can't tell which one matches the query.
LLMs use schema to short-cut what your page is about. Use FAQPage for question/answer blocks, Article for editorial, Product for SKUs, HowTo only for genuinely procedural pages with numbered steps.
A correction to something I told a client in 2025: I had recommended adding HowTo schema across their support pages as a default. It backfired. Google flagged some of the non-procedural pages with a manual rich-result warning, and we had to roll back. HowTo is for "step 1, step 2, step 3" content. Use it where it fits, not as a blanket fix.
Tool: run your URL through validator.schema.org to check that the schema parses.
AI pulls from chunks. Make those chunks obvious: bullet points, numbered steps, definitions, short sentences (especially near the top), and FAQs with bolded questions and clear answers.
Q: What is SEOJuice?
SEOJuice is a website optimization tool that identifies technical SEO issues and offers step-by-step fixes ranked by traffic impact.
That block is citable, extractable, and quote-ready. The same content rewritten as a paragraph would not be.
| Mistake | Fix |
|---|---|
| Vague titles like "Solutions" | Rewrite as the user query: "How does our X solve Y for Z?" |
| Meta title ≠ on-page H1 | Make them match within 10 characters; mixed signals lower trust |
| All caps / styled headers | Use real H2/H3 tags; CSS styling does not carry semantic weight |
| Generic intros ("In today's world…") | Lead with the answer in the first sentence |
| Keyword stuffing | Replace with one strong topical statement plus three concrete examples |
You're not writing for a crawler. You're writing for a machine that's going to read your content, compress it into a 2-sentence summary, and if you're lucky, name-drop your domain at the end.
LLMs don't care about backlinks or keyword density in the traditional sense, although there is still indirect correlation: pages with strong backlink profiles tend to rank, and pages that rank tend to be in the retrieval candidate set. Profound's 2025 data suggests the correlation is real but loose. The real question is: can a model lift your content into a clean answer box without rewriting it into gibberish?
"To reset your router, unplug it for 10 seconds, then plug it back in. Wait 60 seconds before testing your connection."
"Resetting a router is something users can consider when encountering issues. One possible step is unplugging the device for a short time."
The first version is citable. The second gets ignored or paraphrased incorrectly.
Going back to the Perplexity test I described in the intro: when I compared our page to the two competitors that did get cited, their winning quality was specifically this. Their definitions were tight, the verbs were active, and the sentences could be lifted intact. Our page had the same information buried under three paragraphs of preamble. The model had nothing clean to grab.
LLMs will read the first few lines of a section, pull bullet points and numbered steps, ignore long fluffy intros, and skip buried information unless it's in a list or clearly marked.
Think of every section on your site as a potential answer box. Your job: make the answer obvious, extractable, and risk-free for an AI to quote without hallucinating or rewriting.
When I rewrote our /pricing page from a feature dump into definition-led prose in March 2026, the bounce rate dropped from 78% to 61% over the next 21 days. Same traffic, same offer, different structure. Humans liked it for the same reason a model does: the answer was on top, the rest was supporting context.
Start with the core fact, then elaborate. LLMs prioritize clarity over suspense.
Do: "SEOJuice is a website optimization tool that audits technical SEO issues and recommends ranked fixes based on potential traffic impact."
Don't: "SEO is complicated. Many tools try to simplify it, but few succeed. Enter SEOJuice, a new approach that…"
LLMs won't wait for your reveal. They will move on.
Every H2 on your page should double as a user query.
| Old Header | AI-Friendly Header |
|---|---|
| "Benefits" | "What are the benefits of using SEOJuice?" |
| "How It Works" | "How does SEOJuice audit your site?" |
| "Features" | "What features does SEOJuice offer?" |
Don't let this become a 40-hour rabbit hole. You don't need to overhaul your site. You need to make four to six pages retrievable. Start with the pages where impressions are high and bounce is high; those are the ones already in the candidate set, just losing the citation pass.
I spent $200 on a popular "AI citation tracking" SaaS in February 2026 (I won't name it, but if you've shopped this category you've seen it). The dashboard told me seojuice.io was cited in dozens of AI responses across the engines it monitored. When I manually re-ran a 20-query sample of the prompts the tool said we were cited for, only about three actually surfaced us in real Perplexity or ChatGPT sessions. Most of the others were either stale, the citation belonged to a different page on our domain, or the prompt was so obscure it would never come up in real usage.
I tried two more of these tools. Same pattern: high citation counts that didn't survive a manual spot-check. From a small sample of pages we monitored across three tools, the headline numbers were optimistic by 4 to 10x. My current take: until the tooling matures, you're better off with a Google Sheet, a list of 20 prompts that real customers actually run, and a weekly 15-minute manual sweep. The tools may be great in 2027. They are not great yet.
These aren't just for readers. They are citable answer blocks for AI.
Citable content is clear, structured, and self-contained: short definitions, bullet points, FAQs, and direct answers. AI tools quote what they can cleanly extract. The Semrush 2025 analysis of AI Overviews found 85.79% of cited URLs sit in the top 10 organic results, so traditional ranking is still the cost of entry.
Run brand and category prompts in Perplexity and Bing Copilot, weekly. Keep a 10-to-15 prompt list in a sheet and tag each result with the cited URLs. Spot-check claims from any third-party citation tool against real engine sessions before trusting the dashboard.
No. Start with your most valuable pages: high impressions, high bounce, or cornerstone content. Add FAQ blocks, restructure headers, and simplify intros. That covers most of the value.
It is not strictly required, but it meaningfully improves visibility. Schema tells AI what your page is without making it guess. FAQPage and Article are the safe defaults; HowTo only for actually-procedural content.
No. Done right, it helps both. Structured, citable content ranks better, earns more backlinks, and now gets surfaced by AI engines too.
It's a flat-text file at /llms.txt that signals to AI crawlers which pages are most important. The proposal is new (late 2024) and the practical impact is unclear as of May 2026. It's cheap to add, low-risk, and worth doing if you have 10 minutes; don't expect dramatic results from it alone.
no credit card required