Seoxpert.io
highBest Practices

AI Search: robots.txt Blocks Generative-AI Crawlers

robots.txt blocks 7 major AI crawlers, preventing site content from appearing in AI-powered search and answer engines.

By Seoxpert Editorial · Published

Why it matters

AI-powered search engines and answer platforms are becoming significant traffic sources. Blocking all AI crawlers via robots.txt removes your site from these surfaces, impacting visibility and potential citations. Selective blocking allows you to control training vs. citation access.

Impact

Your site will not appear or be cited in AI-driven search results and answer engines if all AI crawlers are blocked.

How it's detected

Automated analysis scans robots.txt for Disallow rules targeting known AI crawler user-agents.

Common causes

  • Copying blanket AI crawler blocklists without review
  • Misunderstanding difference between training and citation bots
  • Desire to prevent AI data scraping without considering citation impact
  • Lack of awareness of new AI crawler user-agents
  • Not updating robots.txt as new AI bots emerge

How to fix it

Review your robots.txt file and identify which AI crawlers you want to block or allow. If you want your site cited in AI search results but not used for training, allow runtime citation bots (e.g., ChatGPT-User, OAI-SearchBot, PerplexityBot) and block training bots (e.g., GPTBot, ClaudeBot). Update your robots.txt accordingly and monitor for new AI crawler user-agents over time.

Code examples

Problem: Blanket Disallow for All AI Crawlers

User-agent: applebot-extended
Disallow: /
User-agent: bytespider
Disallow: /
User-agent: ccbot
Disallow: /
User-agent: claudebot
Disallow: /
User-agent: google-extended
Disallow: /
User-agent: gptbot
Disallow: /
User-agent: meta-externalagent
Disallow: /

Solution: Allow Citation Bots, Block Training Bots

# Allow citation bots
User-agent: ChatGPT-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /

# Block training bots
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /

FAQ

What is the difference between AI training bots and citation bots?

Training bots crawl to collect data for AI model training, while citation bots fetch content live for answer generation and attribution.

Will blocking all AI crawlers prevent my site from being cited in ChatGPT or Perplexity?

Yes, blocking all AI crawlers removes your site from both training and citation in those platforms.

How do I allow my site to be cited but not used for AI training?

Allow citation bots (e.g., ChatGPT-User, OAI-SearchBot, PerplexityBot) and block known training bots in robots.txt.

Do I need to update robots.txt as new AI crawlers appear?

Yes, regularly update your robots.txt to reflect new AI crawler user-agents and your citation/training preferences.

Does Google-Extended control both AI training and citation for Google?

Google-Extended controls access for Google's AI features, including Overviews; review Google's documentation for current behavior.

Found this issue on your site?

Run a scan to see if AI Search: robots.txt Blocks Generative-AI Crawlers affects your pages.

Scan my website →