Seoxpert.io
Free tool

llms.txt validator & AI search readiness checker

Paste a URL. We validate /llms.txt against the llmstxt.org spec AND cross-check /robots.txt for whether the major AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, ChatGPT-User) are blocked — the most common reason a well-formed llms.txt does nothing.

Free. No signup. Rate-limited to 30 checks per IP per hour. Server-side fetch, SSRF-guarded.

We fetch /llms.txt and /robots.txt from the origin and cross-check. A site can have a great llms.txt but still be invisible to AI search because robots.txt blocks the crawlers.

Why this matters now

AI search is real traffic now

ChatGPT search, Perplexity, Claude, and Google AI Overviews are between 5% and 15% of search traffic for many SaaS / B2B sites as of 2026. Sites that aren't cited in those answers lose visibility no matter how well they rank in classic Google.

The two things AI engines look at first: does this site allow my user-agent in robots.txt, and does it have an llms.txt with curated canonical pages. Both are five-minute fixes that most sites still haven't made.

The biggest mistake we see: a site adds a generic “Disallow: /” for “GPTBot” thinking it's opting out of AI training — and accidentally also blocks ChatGPT-User and OAI-SearchBot, the citation crawlers that fetch your pages at query time to cite them in ChatGPT answers. The result: the site disappears from ChatGPT's answers entirely, even though its “no training” intent didn't require that. This tool catches the bad pattern.

FAQ

Common llms.txt questions

What is llms.txt?

llms.txt is an emerging convention (proposed by Jeremy Howard at llmstxt.org) for telling AI search engines how to navigate your site. Think robots.txt + sitemap.xml + an executive summary, all in one Markdown file at the root of your domain. The format: an H1 with the site name, a > blockquote summary, optional paragraphs, then ## Section headings each containing a curated bullet list of [Page title](URL): description.

Which AI engines actually read llms.txt?

As of 2026 the file is being adopted by ChatGPT search, Perplexity, Claude, Anthropic's WebSearch, and various AI-coding tools (Cursor, Continue.dev, etc.). Google AI Overviews don't yet read it but the signal is moving that direction. Even when an engine doesn't read llms.txt directly, having one gives you a single Markdown file you can paste into any LLM-based research workflow.

Why check robots.txt at the same time?

The most common failure mode: a site has a great llms.txt but their robots.txt blocks GPTBot, ClaudeBot, or PerplexityBot — making the llms.txt invisible to the engines that would read it. Even worse, sites often add a generic "no AI training" robots.txt rule that ALSO blocks the citation crawlers (ChatGPT-User, OAI-SearchBot), opting themselves out of being CITED in AI search results. This tool flags that inconsistency.

What's the minimum valid llms.txt?

Per the spec: just an H1 line with the site/project name. Everything else is optional. But a useful llms.txt has the H1 + a > blockquote one-line summary + at least one ## section with 3-10 curated links. Don't list every URL — that's what sitemap.xml is for. llms.txt is for the canonical landing pages, docs, and most important blog posts.

Does llms.txt replace sitemap.xml or robots.txt?

No — they coexist. sitemap.xml is the complete URL list for search-engine crawlers. robots.txt is access rules. llms.txt is a curated, human-readable summary for AI engines. Keep all three.

Where should I host llms.txt?

At the root of your origin, served as text/markdown or text/plain. Example: https://example.com/llms.txt. Each subdomain needs its own llms.txt; the root file doesn't cover subdomains. Per the spec, also offer an optional /llms-full.txt with the FULL content of each linked page concatenated, for engines that want everything in one fetch.

Want the full AI-readiness audit?

A Seoxpert scan checks llms.txt + robots.txt + structured-data + answer-first content patterns, plus 230+ other signals across SEO, security, and EU privacy compliance. Free first scan.

Security headers · Hreflang checker · robots.txt Checker · How to get cited by ChatGPT