How the test works
The tool follows the rules defined in RFC 9309and Google's published behaviour:
- Fetch
/robots.txtfrom the exact host you entered. - Parse all user-agent groups and their Allow/Disallow rules.
- Pick the group matching your chosen user-agent, or the
*group as a fallback. - Find the most specific (longest) rule that matches the path. If an Allow and Disallow are equal length, Allow wins.
- Return the verdict: allowed if no rule matches, or based on the winning rule.
Important limits
- —Allowed ≠ indexable. robots.txt only controls crawling. A URL blocked in robots.txt can still appear in Google's index if other sites link to it. Use a noindex meta tag to keep pages out of the index.
- —Each host has its own robots.txt.
www.example.com/robots.txtdoes not apply toshop.example.com. Test each subdomain separately. - —Path patterns are case-sensitive.
/Admin/and/admin/are different. - —Wildcards are supported.
*matches any sequence of characters, and$anchors to the end of the URL. Not every crawler supports both.
Related pages
Read the full primer in the robots.txt glossary entry. For every crawl-related issue — missing sitemap, blocking CSS, blocking Googlebot entirely — see the crawl & links issue library.