SEO
robots.txt mistakes that silently hide your site from Google
The robots.txt mistakes that quietly drop pages from search: leftover staging blocks, wildcard typos, blocked CSS/JS. Audit checklist plus a free tester inside.
25 April 2026 · 2 min read
Free tool · No signup
robots.txt Tester
Result
ALLOW
Matched group: googlebot
Matched rule: no rule matched — default ALLOW
Quick frame: The most common robots.txt mistakes that hide content: leftover staging block after launch, wildcard typos, blocked CSS/JS that breaks rendering, and confusing Disallow with noindex. Each silently drops pages from search.
Mistake 1: leftover staging block
Pre-launch you blocked everything:
User-agent: *
Disallow: /
Launch day you forgot to remove it. Google honours the block, crawls nothing, and over weeks every page drops from the index. This is the single most common reason for catastrophic post-launch traffic loss.
Detection: search the robots.txt tester above for any path against your live robots.txt. If everything returns DISALLOW, you have a leftover staging block.
Fix: replace with the robots.txt generator "allow-all" template. Add it to a launch checklist.
Mistake 2: wildcard typo
A path with an unintended wildcard:
Disallow: /blog/?*
That single asterisk after the slash blocks anything under /blog/ that has any query string — including legitimate paginated URLs.
Fix: test each Disallow line against real URLs in the tester. Trust nothing without verification.
Mistake 3: blocked CSS / JS
Some legacy sites block /wp-includes/ or /assets/ entirely. Googlebot needs CSS and JS to render the page; without them it sees a stripped layout and may downgrade your mobile-friendliness or rendering quality.
Detection: Search Console → Mobile Usability or URL Inspection → see how Googlebot rendered the page.
Fix: explicitly Allow CSS and JS paths:
User-agent: Googlebot
Allow: /assets/*.css
Allow: /assets/*.js
Mistake 4: Disallow ≠ noindex
A common misconception: putting Disallow: /private/ in robots.txt removes those pages from search. It doesn't — Disallow blocks crawling, not indexing. Disallowed URLs can still appear in results (with placeholder snippets) if external sites link to them.
To truly remove a page from the index, use the robots meta generator for noindex. Google must be able to crawl the page to see the noindex, so don't Disallow until after the index drops.
Audit checklist
- Read your live /robots.txt manually — anything you don't recognise is suspect.
- Test 5 important URLs through the tester for each major user-agent.
- Check Search Console → Coverage → Excluded → look for "Blocked by robots.txt".
- Confirm Sitemap: directive points to your live sitemap (build with the sitemap generator).
The companion piece on what NOT to put in robots.txt is in noindex vs Disallow vs canonical.
FAQ
Q. Should I disallow URL parameters? A. Rarely. Canonical tags handle parameter duplicates better. Use Disallow for parameters only when crawl budget is genuinely strained.
Q. Does Disallow speed up crawl? A. Indirectly — it focuses Googlebot on indexable URLs. For most sites this isn't a problem worth solving.
Q. Can I have an empty robots.txt? A. Yes — it means "allow everything". But an explicit User-agent: * / Allow: / with a Sitemap: line is clearer and harder to misread.
Try the free tool
robots.txt Tester
Paste robots.txt + URL — see allow / disallow result per user-agent.
Open robots.txt Tester →