Seoxpert.io
AI search optimization

How to get cited by ChatGPT, Claude, Perplexity, and Google AI Overviews

Getting cited by AI search engines comes down to five technical conditions: an /llms.txt file, robots.txt that allows AI runtime crawlers, Organization JSON-LD with a populated sameAs array, answer-first paragraphs on question-titled pages, and canonical landing pages for the topics you cover. Each is testable. Each is a one-deploy fix. Website regression monitor for founders, agencies, and developers — SEO, security, performance and compliance checks after every deploy.

Below: how each check works, why AI engines weight it, and exactly what to ship. The Seoxpert scanner runs all five against any URL — free.

Run a free GEO citability check. No credit card required.

The five technical fixes

What AI search engines actually check

1. Add an /llms.txt file

A plain-text manifest at the root of your domain that curates which pages LLMs should ingest.

Place it at https://yourdomain.com/llms.txt. Format: a # H1 with the site name, a > blockquote with the one-sentence positioning, an optional summary paragraph, then ## section headings with bulleted URL lists and one-sentence descriptions. Aim for the canonical landing pages, docs, and the most important blog posts — not every URL on the site. ~30 minutes to write the first version.

2. Audit your robots.txt for AI bot blocks

Decide deliberately whether each AI crawler is blocked. Most sites block them by accident.

The bots that matter: GPTBot (OpenAI training), ChatGPT-User (live ChatGPT search), OAI-SearchBot (ChatGPT search index), PerplexityBot (Perplexity index), Perplexity-User (Perplexity live), ClaudeBot (Claude research), Google-Extended (Gemini / AI Overviews), Applebot-Extended (Apple Intelligence). Common combinations: allow ChatGPT-User + OAI-SearchBot + PerplexityBot + Google-Extended (citation), block GPTBot + ClaudeBot + CCBot + Bytespider (training). If you want to be cited everywhere, remove the AI-crawler disallows entirely.

3. Ship Organization JSON-LD on the homepage

Tell AI engines which entity you are with structured data and a populated sameAs array.

In the homepage <head>: <script type="application/ld+json">{"@context":"https://schema.org","@type":"Organization","name":"<Brand>","url":"<Origin>","logo":"<Logo URL>","description":"<one-sentence positioning>","sameAs":["<LinkedIn URL>","<Twitter URL>","<GitHub/Crunchbase>"]}</script>. The sameAs array is critical — it becomes the canonical identity link AI engines anchor to when disambiguating brand mentions across the web. ~20 minutes if your social profiles are already public.

4. Rewrite question-titled pages to lead with the answer

For every page titled "What is X?" / "How does Y work?", make the first paragraph a direct 1–3 sentence answer.

Audit your blog and docs for question-shaped titles. For each one, rewrite the first paragraph as 1–3 short sentences (8–35 words each) that lead with the term being defined and follow with the answer. Move CTAs and preamble below the fold. AI engines extract from the top of the page; a competitor with "X is a …" wins the citation slot from a page that opens with "In today's fast-moving world, X has become increasingly important …".

5. Build canonical landing pages for the topics you cover

Mentioning a topic across blog posts is not the same as having a page that ranks for it. AI engines cite landing pages, not paragraphs.

Audit which topics your site repeatedly references but lacks a dedicated landing page for. For each, create one — H1 with the topic, 1–2 sentence direct answer, value proposition, supporting evidence (case studies, FAQs), and a CTA. Link to it from the homepage and from the supporting blog posts. Seoxpert's content-gap agent surfaces these candidates automatically with cited evidence URLs.

Why this matters now

AI search is a real referral channel

ChatGPT search, Perplexity, and Google AI Overviews now drive a measurable share of B2B referral traffic for many sites — and the share is growing month-over-month. Whether you get cited there is a different question from whether you rank in classical SERPs: the underlying engines fetch live pages at answer time and pick which URLs to attribute, based on signals that Google's ranking algorithm has never weighed.

Most sites are unintentionally opted-out of AI citation. The most common reasons we see in scans: a copy-pasted robots.txt with User-agent: GPTBot / Disallow: / blocking the runtime crawler too, no Organization schema so the brand is ambiguous, blog posts that bury the answer under a sales-pitch intro. Each of these is a one-deploy fix.

The technical bar is low — but it's a bar. Sites that clear it get the citation slot; sites that don't lose it to a competitor.

Where Seoxpert fits

The only audit tool that checks all five

As of May 2026, no other audit tool ships these checks. Ahrefs, Semrush, and Screaming Frog all skip them entirely. Seoxpert's GEO citability agent runs the four technical checks (llms.txt presence, AI bot disallows in robots.txt, Organization schema, answer-first paragraph heuristic), and the content-gap agent runs an LLM pass that surfaces topics you cover but lack a canonical landing page for, with cited evidence URLs from the crawl.

What the scanner outputs

  • Whether /llms.txt exists, and if not, what to write
  • Which AI bots your robots.txt is currently blocking (deliberately or accidentally)
  • Whether the homepage has Organization JSON-LD with sameAs identity links
  • Which question-titled pages bury the answer under preamble — with the offending paragraph excerpted
  • Concrete topic gaps where you cover the topic across blog posts but have no canonical landing page
FAQ

Common questions

Which AI search engines actually cite websites?

ChatGPT search (the live web mode in ChatGPT Plus, powered by OAI-SearchBot and ChatGPT-User), Perplexity (via PerplexityBot and Perplexity-User), Claude (via ClaudeBot in research mode), Google AI Overviews / Gemini (via Google-Extended on top of regular Googlebot), and Microsoft Copilot (via Bingbot). Each one fetches pages live at answer time and decides whether to cite the URL — so the technical conditions for citation are testable, not magic.

What is /llms.txt and do I need it?

llms.txt is the emerging convention for "robots.txt but for LLMs" — a plain-text file at the root of your site that tells AI engines which content to ingest and how it is structured. Format: a # H1 with the site name, a > blockquote with the one-sentence positioning, then ## sections with curated URL lists. AI engines do not strictly require it yet, but having one signals you have thought about LLM ingestion and is increasingly used as a freshness/quality signal. See https://llmstxt.org for the spec.

My robots.txt blocks GPTBot. Will I still be cited?

Probably not, and this is the most common unintentional blocker. Some publishers deliberately block GPTBot to prevent training-data ingestion — but the same rule also blocks ChatGPT-User and OAI-SearchBot, which are runtime fetchers not training scrapers. If you want to be cited but not trained on, you need targeted Allow rules: Disallow GPTBot but Allow ChatGPT-User + OAI-SearchBot (and PerplexityBot + Google-Extended for the other engines). A blanket Disallow for AI bots opts the site out of citation entirely.

Why does Organization schema matter for AI citations?

When ChatGPT / Claude / Perplexity write an answer that mentions your brand, they need to know which entity you are. Without Organization JSON-LD, the engine guesses from page text and can confuse you with a similarly-named competitor, attribute the answer to a Wikipedia stub, or omit a brand citation entirely. Organization JSON-LD on the homepage with a populated `sameAs` array (LinkedIn, Twitter, Crunchbase, GitHub) is the single highest-leverage AI-citability fix.

What does an "answer-first paragraph" mean?

Pages titled like a question — "What is X?", "How does Y work?" — should answer the question in 1–3 short sentences (8–35 words each) at the very top, before any preamble. AI engines extract citation snippets from the head of the page; a sales-pitch intro or long preamble buries the answer and the page loses the citation slot to a competitor whose first sentence is "X is …". Move CTAs and decorative copy below the fold.

Is Seoxpert the only audit tool that checks for AI citability?

Yes, as of May 2026 — none of Ahrefs, Semrush, or Screaming Frog audit for /llms.txt presence, AI bot disallows in robots.txt, Organization JSON-LD entity schema, or answer-first paragraph structure. Seoxpert ships a dedicated GEO citability agent that runs all four checks plus a content-gap agent that uses an LLM to surface missing topics.

How long until AI citations show up after I fix these?

AI engines re-fetch on each query so improvements compound from your next deploy. There is no equivalent of waiting for Google to re-crawl or recompute rankings. The day you add Organization schema and an answer-first paragraph, the next ChatGPT-Search query for "[your topic]" can lift it. Practically: expect a 1-2 week observation window before you can see traffic move, because the question is "do enough users ask the right query" not "did the crawler re-index".

Run the GEO citability check on your site

Free scan · No credit card · Results in under 2 minutes.

Also read: How to rank higher on Google