Free robots.txt Tester
Check any site's robots.txt and test whether a path is allowed for Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, or any user-agent. Returns the matched rule, the surrounding rule group, and any sitemap declarations found in the file.
Free. No signup. Rate-limited to 30 checks per IP per hour. Every fetch is server-side and SSRF-guarded — internal IPs, loopback, and bare hostnames are rejected before the request leaves our network.
Three scenarios where robots.txt is the silent killer
After a deploy, when traffic suddenly drops. A copy-pasted Disallow: /from staging or a CMS upgrade that overwrites the production robots.txt is one of the most common causes of “Google stopped ranking us overnight”. Run the checker against / with User-agent set to Googlebot — if it says Disallowed, that's your bug.
When AI search engines stop citing you. Many sites added a “no AI training” robots.txt block in 2023–2024 without realising the same rule blocks the runtime citation crawlers. Test User-agent ChatGPT-User and OAI-SearchBotagainst /. If they're Disallowed, you're opted out of ChatGPT search citations.
During a domain migration.Subdomain robots.txt files don't inherit from the apex. After moving from www.example.com to example.com, both hosts need a valid robots.txt or one of them silently blocks crawling. Check both.
Three robots.txt examples
User-agent: * Disallow: /admin/ Disallow: /api/internal/ Disallow: /*?utm_source=* Sitemap: https://example.com/sitemap.xml
Blocks the admin panel and internal API endpoints, blocks UTM-parameter URL variants from being indexed as duplicates, declares the sitemap. Nothing more — public pages are crawlable by all bots, simple and explicit.
User-agent: * Disallow: /assets/ Disallow: /static/ Disallow: /*? Disallow: /tag/ Disallow: /category/
Blocks /assets/ and /static/ — Google can't render the pages. Disallow: /*? blocks every paginated and filtered URL. Disallowing /tag/ and /category/ is a blog-template default that often hides legitimate ranking pages. No sitemap declared.
User-agent: * Disallow: /
Blocks every page from every crawler. Real-world: this file shipped to production at multiple well-known companies because the staging environment's robots.txt was copied into the main bundle by mistake. Add a CI assertion that the production file never contains Disallow: / in the User-agent: * group.
How the test works
The tool follows the rules defined in RFC 9309and Google's published behaviour:
- Fetch
/robots.txtfrom the exact host you entered. - Parse all user-agent groups and their Allow/Disallow rules.
- Pick the group matching your chosen user-agent, or the
*group as a fallback. - Find the most specific (longest) rule that matches the path. If an Allow and Disallow are equal length, Allow wins.
- Return the verdict: allowed if no rule matches, or based on the winning rule.
Six robots.txt bugs we see most often
- Staging robots.txt deployed to production. A pre-prod environment usually ships
User-agent: *+Disallow: /. If that file makes it to production untouched, every page is deindexed within days. Add a CI step that asserts the production robots.txt does NOT contain a blanket disallow. - Blocking CSS or JS. Google needs to render pages to evaluate them.
Disallow: /assets/orDisallow: /static/can prevent Googlebot from rendering the page properly, which Search Console reports as a mobile-usability error. - Using robots.txt to keep pages out of search. robots.txt only blocks crawling. A blocked URL with external links pointing to it still appears in search — without a description and looking suspicious. Use
noindexmeta tags for that, and do NOT also block the page in robots.txt or Googlebot will never see the noindex directive. - Forgetting subdomain robots.txt. A correct robots.txt at
www.example.com/robots.txtdoes NOT governapi.example.comorshop.example.com. Each subdomain needs its own. - UTF-8 byte-order-mark at start of file. A BOM (3 invisible bytes) before the first line confuses strict crawlers — Bing in particular ignores the entire file when it encounters one. Save robots.txt as UTF-8 without BOM.
- Wildcard misuse.
Disallow: /*?blocks any URL with a query string — including paginated category pages, filtered product lists, and tracking-parameter URLs from your own ad campaigns. Use$to anchor patterns to URL end where appropriate.
Common questions about robots.txt
What is a robots.txt checker?
A robots.txt checker fetches the robots.txt file at the root of a domain, parses the directives, and tells you whether a specific path is allowed or disallowed for a specific crawler. It is the first thing to run when a page that should rank suddenly disappears from Google — a wrong Disallow rule is one of the most common causes.
How do I check if Googlebot is blocked from my site?
Enter your domain and the path you want to test (e.g. /), set User-agent to Googlebot, and click Test. The tool fetches the live robots.txt, finds the rule group that applies to Googlebot, and reports whether the path is Allow or Disallow — including which specific rule won the precedence match.
Does robots.txt block AI crawlers like GPTBot, ClaudeBot, and PerplexityBot?
It depends on your robots.txt. The checker has GPTBot and other AI bots in the User-agent dropdown — pick one and run the test against / to see whether the entire site is disallowed. Many sites accidentally block AI crawlers because they copy-pasted a "no AI training" rule that also blocks the runtime citation crawlers (ChatGPT-User, OAI-SearchBot, PerplexityBot). Result: the site cannot be cited by ChatGPT search, Perplexity, Claude, or Google AI Overviews.
How does robots.txt rule precedence work?
Google picks the most specific (longest-matching) rule for the given path. If an Allow and a Disallow are equally specific, Allow wins. The checker applies the same rule and shows you which directive actually decides whether the URL is crawlable.
Does each subdomain need its own robots.txt?
Yes. robots.txt applies only to the exact host from which it is served. www.example.com/robots.txt does not govern shop.example.com — each subdomain needs its own file at /robots.txt.
Does robots.txt prevent indexing?
No. robots.txt only blocks crawling. A URL blocked there can still appear in Google's search results if other sites link to it — just without a description. To prevent indexing, use a noindex meta tag on the page itself. Do not block a noindex page in robots.txt or Googlebot will never see the noindex directive.
Want the full audit, not just robots.txt?
A Seoxpert scan checks robots.txt plus 442 checks — SEO, security, performance, AI / GEO citability, content gaps. Free first scan.
Sitemap checker · Full coverage · How to get cited by ChatGPT