Seoxpert.io
Glossary/llms.txt
Glossary

What Is llms.txt?

llms.txtis a plain-markdown file served from the root of a website that curates which pages LLMs should ingest as canonical content. Created by Jeremy Howard in September 2024, it's the AI-era equivalent of robots.txt — but instead of blocking, it recommends.

Format

Plain markdown. The spec is intentionally simple: an H1 with the site name, a blockquote with the one-sentence positioning, optional intro paragraph, then ## sections with bulleted URL lists.

# Site Name

> One-sentence positioning describing what the site does.

Optional 2-4 sentence intro paragraph for additional context.

## Core docs

- [Page title](https://example.com/path): One-sentence description of what the page is for.
- [Another page](https://example.com/other): Brief description.

## Guides

- [Guide title](https://example.com/guide): Description.

Each URL line: - [Title](URL): one-sentence description. The description matters as much as the URL — it tells the LLM what the page is FOR, not just what it's called.

llms.txt vs robots.txt

Both are plain-text files at the site root. They do completely different jobs.

  • robots.txt blocks access at the protocol level (1994 spec, RFC 9309). Tells crawlers what they CAN and CAN'T fetch.
  • llms.txt recommends content (2024 spec). Tells LLMs which pages are the most valuable canonical content.

Read more: llms.txt vs robots.txt comparison.

Which engines honor it

  • Perplexity uses llms.txt to prioritize indexing.
  • Anthropic Claude reads it for content discovery in research mode.
  • OpenAI fetches it (visible in server logs) but has not publicly committed to honoring it.
  • Google has not endorsed the spec.
  • Microsoft has not endorsed the spec.

Practical implication: llms.txt is a “small upside, no downside” decision. Costs ~30 minutes to write, opens up small ranking boosts in engines that respect it, ignored by engines that don't.

Common mistakes

  • Listing every URL on the site. Defeats the purpose. Aim for 10-50 curated canonical pages.
  • URLs blocked by robots.txt. Crawler honors robots.txt and ignores the llms.txt entry. Cross-check both files.
  • Wrong content-type. Must be text/plain. Some parsers skip files served as text/html.
  • SPA returning index.html. Same trap as sitemap.xml — Next.js / Vercel catch-all routes can serve the application shell for /llms.txt. Use static-file serving or an explicit route handler.
  • Stale URLs. Every URL in llms.txt should return 200. Dead links train AI engines to trust llms.txt less.

Related terms

Validate your llms.txt

Run the free llms.txt validator — parses the markdown, checks for required H1, validates URL formats, and cross-references against robots.txt to catch URLs that would be blocked at fetch time. Full how-to guide at /llms-txt-guide.

Frequently asked questions

Where does llms.txt live?

At the exact path /llms.txt on the root of each host. Served as Content-Type: text/plain. Like robots.txt, it applies only to the exact host it's served from — subdomains need their own file.

Which AI engines honor llms.txt?

As of mid-2026: Perplexity uses it to prioritize indexing. Anthropic's Claude reads it for content discovery. OpenAI fetches it but has not publicly committed to honoring the recommendations. Google and Microsoft have not endorsed the spec.

Does llms.txt replace robots.txt?

No. They do different things. robots.txt is access control (blocks crawlers from URLs). llms.txt is content curation (recommends which URLs are canonical). You need both.

How many URLs should llms.txt list?

Curate aggressively — typically 10-50 URLs for a small-to-medium site. The point is to tell AI engines which pages are most important, not to mirror sitemap.xml. Listing 500 URLs dilutes the signal.

Who created llms.txt?

Jeremy Howard (Answer.AI, fast.ai) published the spec at llmstxt.org in September 2024.

Validate your llms.txt for free.