Format
Plain markdown. The spec is intentionally simple: an H1 with the site name, a blockquote with the one-sentence positioning, optional intro paragraph, then ## sections with bulleted URL lists.
# Site Name
> One-sentence positioning describing what the site does.
Optional 2-4 sentence intro paragraph for additional context.
## Core docs
- [Page title](https://example.com/path): One-sentence description of what the page is for.
- [Another page](https://example.com/other): Brief description.
## Guides
- [Guide title](https://example.com/guide): Description.Each URL line: - [Title](URL): one-sentence description. The description matters as much as the URL — it tells the LLM what the page is FOR, not just what it's called.
llms.txt vs robots.txt
Both are plain-text files at the site root. They do completely different jobs.
- — robots.txt blocks access at the protocol level (1994 spec, RFC 9309). Tells crawlers what they CAN and CAN'T fetch.
- — llms.txt recommends content (2024 spec). Tells LLMs which pages are the most valuable canonical content.
Read more: llms.txt vs robots.txt comparison.
Which engines honor it
- — Perplexity uses llms.txt to prioritize indexing.
- — Anthropic Claude reads it for content discovery in research mode.
- — OpenAI fetches it (visible in server logs) but has not publicly committed to honoring it.
- — Google has not endorsed the spec.
- — Microsoft has not endorsed the spec.
Practical implication: llms.txt is a “small upside, no downside” decision. Costs ~30 minutes to write, opens up small ranking boosts in engines that respect it, ignored by engines that don't.
Common mistakes
- — Listing every URL on the site. Defeats the purpose. Aim for 10-50 curated canonical pages.
- — URLs blocked by robots.txt. Crawler honors robots.txt and ignores the llms.txt entry. Cross-check both files.
- — Wrong content-type. Must be
text/plain. Some parsers skip files served as text/html. - — SPA returning index.html. Same trap as sitemap.xml — Next.js / Vercel catch-all routes can serve the application shell for /llms.txt. Use static-file serving or an explicit route handler.
- — Stale URLs. Every URL in llms.txt should return 200. Dead links train AI engines to trust llms.txt less.
Related terms
- — GEO (Generative Engine Optimization) — llms.txt is one of the four GEO fundamentals.
- — AEO (Answer Engine Optimization) — the umbrella category.
- — robots.txt — the access-control counterpart.
- — sitemap.xml — exhaustive URL list vs llms.txt's curated list.
Validate your llms.txt
Run the free llms.txt validator — parses the markdown, checks for required H1, validates URL formats, and cross-references against robots.txt to catch URLs that would be blocked at fetch time. Full how-to guide at /llms-txt-guide.