SEO
XML sitemap best practices for India — sizing, lastmod, submission
XML sitemap rules that matter in 2026 — 50,000 URL ceiling, accurate lastmod, sitemap index grouping, robots.txt and Search Console submission. India focus.
24 April 2026 · 2 min read
Quick frame: XML sitemap caps at 50,000 URLs / 50 MB uncompressed. Beyond that, split into multiple sitemaps and reference them via a sitemap index. The single field that actually matters is lastmod — accurate, content-change-driven dates. Priority and changefreq are largely ignored by Google.
Generating the sitemap
Two paths depending on tooling:
- Framework-native: Next.js, Hugo, Astro all auto-generate sitemaps.
- Manual / custom: paste URLs into the XML sitemap generator, download, host at
/sitemap.xml.
For multi-thousand URL sites with multiple content types, group into themed sitemaps (sitemap-blog.xml, sitemap-products.xml, sitemap-pages.xml) and combine via the sitemap index generator.
lastmod is the only field that matters (for Google)
Google ignores priority and largely ignores changefreq. The signal that actually drives re-crawl prioritisation is lastmod — but only when it's honest.
A sitemap that updates lastmod daily but with no actual content changes gets ignored. A sitemap that updates lastmod only when content genuinely changes (republish, body edit, schema bump) gets prioritised re-crawl.
Reference from robots.txt
Always add the sitemap URL to your robots.txt:
Sitemap: https://www.example.in/sitemap.xml
Every major crawler (Google, Bing, Yandex, DuckDuckGo) reads this directive. It's the easiest way to expose sitemaps to engines without dashboard access. For multi-sitemap sites, you can list multiple Sitemap: lines.
Submission and monitoring
- Google Search Console → Sitemaps → submit your sitemap URL.
- Bing Webmaster Tools → Sitemaps → submit same URL.
- Re-check Search Console → Sitemaps weekly. Look for "Submitted" vs "Indexed" counts. Large gaps indicate crawl or content quality issues.
Indian-context sizing examples
- Small blog (50–500 URLs): single sitemap.xml, hand-crafted is fine.
- Mid e-commerce (1,000–50,000 URLs): single auto-generated sitemap; might split products from blog later.
- Large marketplace (50,000+ URLs): sitemap index referencing multiple themed sitemaps, regenerated nightly.
Common pitfalls
- Sitemap contains non-200 URLs (404s, 301s) — these waste crawl budget. Audit and prune.
- Sitemap contains noindex pages — strip them. The sitemap should list indexable URLs only.
- Sitemap contains the wrong protocol (http instead of https) — fix at generation time.
The companion piece on robots.txt is in robots.txt mistakes that hide your site.
FAQ
Q. Should I include hreflang in the sitemap or in HTML? A. Either works. For sites with many language variants, sitemap-level hreflang reduces per-page bloat.
Q. Does Google index a sitemap on submission? A. No — submission tells Google to fetch the sitemap; it's a guide for crawling, not a guarantee of indexing.
Q. How often should I re-submit? A. You don't need to — Google re-fetches automatically based on lastmod. Manual re-submission is rarely useful.
Try the free tool
XML Sitemap Generator
Paste URLs → standards-compliant XML sitemap with lastmod and priority.
Open XML Sitemap Generator →