Format
A minimal sitemap.xml:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-05-20</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2026-04-15</lastmod>
</url>
</urlset>Only <loc> is required. <lastmod> is widely used; <changefreq> and <priority> are mostly ignored by Google.
Sitemap index files
For sites larger than 50,000 URLs, split into multiple sitemap files referenced from a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
</sitemapindex>Common mistakes
- — Sitemap returns HTML instead of XML. Vercel / Next.js / SPA catch-all routes serve index.html for unmatched paths. Symptom: the file parses, but the root element is
<html>, not<urlset>. - — Future-dated lastmod. Almost always a server-time bug. Google may stop trusting lastmod across the entire sitemap.
- — Listing noindex / robots-blocked URLs. Sends conflicting signals — Google has to reconcile “you told me about this URL but also told me not to index it.”
- — Listing 404 URLs. Dead links train Google to trust the sitemap less.
- — Sitemap-index pointing at dead children. After a refactor that renamed a route, the index file still references the old child sitemap path.
- — Exceeding 50,000 URLs in one file. Google ignores the entire file. Split into a sitemap index.
- — Not referencing from robots.txt. Add
Sitemap: https://example.com/sitemap.xmlso crawlers discover it automatically.
sitemap.xml vs llms.txt
Both are URL-listing files for crawlers. sitemap.xml is exhaustive (machine-readable list of every URL for traditional search engines). llms.txt is curated (selective list of canonical pages for LLM ingestion). Most sites need both.
Related terms
- — robots.txt — reference the sitemap via
Sitemap:. - — llms.txt — curated URL list for AI crawlers.
- — Canonical tag — sitemap URLs should be the canonical version.
- — noindex — don't list noindex URLs in the sitemap.
Validate your sitemap.xml
Use the free sitemap checker— fetches the file, validates XML structure, flags future-dated lastmod, dead URLs, and Google's size limits.