This issue occurs when URLs included in your XML sitemap are also blocked by your site's robots.txt file. This creates conflicting instructions for search engin
By Seoxpert Editorial · Published · Updated
Sitemaps are intended to help search engines discover and prioritize important, indexable pages. If those same URLs are blocked in robots.txt, crawlers are told not to access them, which can prevent them from being indexed and waste crawl budget. This contradiction can result in poor visibility for key pages and inefficient crawling of your site.
Search engines may not crawl or index important pages, leading to reduced organic visibility. Crawl budget is wasted on URLs that cannot be accessed, and search engines may lose trust in your sitemap's accuracy, potentially ignoring it altogether.
This issue is typically detected by running a site audit with SEO tools (e.g., Google Search Console, Screaming Frog, Sitebulb) that compare your sitemap URLs against your robots.txt rules. Search Console will often flag 'Submitted URL blocked by robots.txt' errors under Coverage reports.
Problematic robots.txt and sitemap.xml
# robots.txt
User-agent: *
Disallow: /products/
# sitemap.xml
<urlset>
<url>
<loc>https://example.com/products/widget-1</loc>
</url>
<url>
<loc>https://example.com/products/widget-2</loc>
</url>
</urlset>
# The /products/ URLs are in the sitemap but blocked by robots.txt.Fixed robots.txt (allowing sitemap URLs)
# robots.txt
User-agent: *
Disallow: /private/
# /products/ is no longer blocked, so sitemap URLs are crawlable.Fixed sitemap.xml (removing blocked URLs)
# robots.txt
User-agent: *
Disallow: /products/
# sitemap.xml
<urlset>
<!-- Removed /products/ URLs since they are blocked -->
</urlset>It sends conflicting signals to search engines: the sitemap says 'please crawl and index this URL,' but robots.txt says 'do not crawl.' This can prevent important pages from being indexed and wastes crawl budget.
Use tools like Google Search Console (Coverage report), Screaming Frog, or Sitebulb to cross-reference sitemap URLs with your robots.txt rules. Google Search Console will specifically flag 'Submitted URL blocked by robots.txt' errors.
You should update either, depending on your intent. If the URLs should be indexed, update robots.txt to allow them. If they should not be indexed, remove them from the sitemap. Only include indexable URLs in your sitemap.
'noindex' can prevent indexing, but if a URL is blocked by robots.txt, search engines can't see the 'noindex' directive. For sitemap URLs, it's best to allow crawling and use 'noindex' if you don't want them indexed, or remove them from the sitemap if they shouldn't be indexed at all.
Search engines may ignore those URLs, flag errors in Search Console, and potentially distrust your sitemap, reducing its effectiveness for other URLs.
Run a scan to see if Sitemap URLs Blocked by robots.txt affects your pages.
Scan my website →