llms.txt: complete guide (with examples)
llms.txt is a plain-markdown file at /llms.txt that curates which pages LLMs should ingest as canonical content.Published at llmstxt.org in September 2024 by Jeremy Howard. Format: H1 with site name, blockquote with one-sentence positioning, optional intro, then ## sections with bulleted URL lists. Honored by Perplexity and Claude; OpenAI fetches but doesn't commit. ~30 minutes to write the first version. Validate via our free llms.txt validator before deploying.
Audit your llms.txt + robots.txt + the full GEO stack. Free first scan.
The llms.txt format
A complete, minimal llms.txt:
# Seoxpert
> Seoxpert is a website regression monitor for founders, agencies, and developers — SEO, security, performance and compliance checks after every deploy.
Seoxpert runs over 442 automated checks across 20 categories on any URL, returning a prioritized fix list and a regression diff vs the last scan.
## Core docs
- [/website-audit](https://seoxpert.io/website-audit): Free full-site SEO audit with 442 checks
- [/coverage](https://seoxpert.io/coverage): The complete list of scanner checks
- [/docs](https://seoxpert.io/docs): API + webhooks + deploy hooks
## GEO / AI search
- [/geo](https://seoxpert.io/geo): Generative Engine Optimization hub
- [/how-to-get-cited-by-chatgpt](https://seoxpert.io/how-to-get-cited-by-chatgpt): Five technical fixes for AI citation
- [/llms-txt-guide](https://seoxpert.io/llms-txt-guide): How to write llms.txt
## Tools
- [/tools/llms-txt](https://seoxpert.io/tools/llms-txt): llms.txt validator
- [/tools/robots-txt-checker](https://seoxpert.io/tools/robots-txt-checker): robots.txt testerSave as /public/llms.txt in Next.js (or equivalent static path in your framework). Serve as Content-Type: text/plain.
Element-by-element breakdown
1. # Site name (required H1)
Single line, your brand or site name. AI engines anchor entity resolution against this string. Use your canonical brand name exactly as it appears in Organization JSON-LD.
2. > Positioning (required blockquote)
One sentence describing what the site does. Reuse the canonical positioning sentence you use across your homepage, OG card, and email footer — consistency is the citation signal.
3. Optional intro paragraph
2-4 sentence elaboration. Add context the blockquote couldn't cover. Skip if it's redundant.
4. ## Section headings (required)
Group your curated URLs by category. Common section names: "Core docs", "Tools", "Guides", "API reference", "Case studies". Each section = a logical content cluster from an LLM's perspective.
5. - [Title](URL): description (required bullets)
Markdown link followed by colon-space + one-sentence description. The description matters as much as the URL — it tells the LLM what the page is FOR, not just what it's called.
Common mistakes
- Listing every URL on the site. Defeats the purpose. ~10-50 curated URLs is the sweet spot; lists of 500+ get treated as sitemap noise.
- URLs blocked by robots.txt. The crawler honors robots.txt and ignores llms.txt entries pointing at disallowed paths. Cross-check both files before deploying.
- Missing descriptions. Bare URL lists (no description text after the colon) provide much weaker citation signals than annotated entries.
- Wrong content-type. Must be
text/plain. Serving astext/htmlorapplication/octet-streamcauses some parsers to skip it. - SPA returning index.html. Same trap as sitemap.xml — Vercel/Next.js catch-all routes can serve the application shell HTML for /llms.txt. Use static-file serving or an explicit route handler that returns the markdown.
- Stale URLs. Every URL in llms.txt should return 200. Dead links train AI engines to trust llms.txt less. Re-check quarterly.
FAQ
What is llms.txt?
llms.txt is a plain-markdown file served from the root of a website (e.g. https://example.com/llms.txt) that curates which pages LLMs should ingest as canonical content. Created by Jeremy Howard (Answer.AI / fast.ai) and published at llmstxt.org in September 2024, it's the AI-era equivalent of robots.txt — but instead of blocking, it recommends. The format is intentionally simple: H1 site name, blockquote positioning, optional intro paragraph, then ## sections with bulleted URL lists.
Which AI engines actually use llms.txt?
As of mid-2026: Perplexity uses it to prioritize indexing. Anthropic Claude reads it for content discovery in research mode. OpenAI fetches it but has not publicly committed to honoring the recommendations. Google and Microsoft have not endorsed the spec. The practical implication: llms.txt is a "small upside, no downside" decision — costs 30 minutes to write, helps the engines that respect it, ignored by the rest.
How is llms.txt different from sitemap.xml?
Sitemap.xml is a machine-readable list of EVERY URL on your site, optimized for search engine crawling. llms.txt is a CURATED list of your most important canonical pages, with human-readable descriptions, optimized for LLM understanding. Sitemap = exhaustive. llms.txt = selective. AI engines that respect llms.txt use it to identify canonical landing pages; they still consume sitemap.xml for the long tail.
Should llms.txt list every page on my site?
No — curate aggressively. Aim for canonical landing pages, the most important documentation, top-performing blog posts. ~10-50 URLs is typical for a small-to-medium site. Listing 500 URLs dilutes the signal and removes the curation value. The point is to tell AI engines "if you only fetched these 30 pages, you'd understand our site" — not to mirror sitemap.xml.
Where can I validate my llms.txt?
Use the Seoxpert llms.txt validator at /tools/llms-txt — it parses the markdown, checks for the required H1, validates URL formats, flags duplicate entries, and cross-references against your robots.txt to catch URLs that would be blocked at fetch time. Free, no signup.
Validate your llms.txt
Free validator. Parses the markdown, checks URLs, cross-references robots.txt.