robots.txt mistakes that silently hide your site from Google

The robots.txt mistakes that quietly drop pages from search: leftover staging blocks, wildcard typos, blocked CSS/JS. Audit checklist plus a free tester inside.

25 April 2026 · 2 min read

Free tool · No signup

robots.txt Tester

Open full tool →

Paste robots.txt

User-agentURL path

Result

ALLOW

Matched group: googlebot

Matched rule: no rule matched — default ALLOW

Quick frame: The most common robots.txt mistakes that hide content: leftover staging block after launch, wildcard typos, blocked CSS/JS that breaks rendering, and confusing Disallow with noindex. Each silently drops pages from search.

Mistake 1: leftover staging block

Pre-launch you blocked everything:

User-agent: *
Disallow: /

Launch day you forgot to remove it. Google honours the block, crawls nothing, and over weeks every page drops from the index. This is the single most common reason for catastrophic post-launch traffic loss.

Detection: search the robots.txt tester above for any path against your live robots.txt. If everything returns DISALLOW, you have a leftover staging block.

Fix: replace with the robots.txt generator "allow-all" template. Add it to a launch checklist.

Mistake 2: wildcard typo

A path with an unintended wildcard:

Disallow: /blog/?*

That single asterisk after the slash blocks anything under /blog/ that has any query string — including legitimate paginated URLs.

Fix: test each Disallow line against real URLs in the tester. Trust nothing without verification.

Mistake 3: blocked CSS / JS

Some legacy sites block /wp-includes/ or /assets/ entirely. Googlebot needs CSS and JS to render the page; without them it sees a stripped layout and may downgrade your mobile-friendliness or rendering quality.

Detection: Search Console → Mobile Usability or URL Inspection → see how Googlebot rendered the page.

Fix: explicitly Allow CSS and JS paths:

User-agent: Googlebot
Allow: /assets/*.css
Allow: /assets/*.js

Mistake 4: Disallow ≠ noindex

A common misconception: putting Disallow: /private/ in robots.txt removes those pages from search. It doesn't — Disallow blocks crawling, not indexing. Disallowed URLs can still appear in results (with placeholder snippets) if external sites link to them.

To truly remove a page from the index, use the robots meta generator for noindex. Google must be able to crawl the page to see the noindex, so don't Disallow until after the index drops.

Audit checklist

Read your live /robots.txt manually — anything you don't recognise is suspect.
Test 5 important URLs through the tester for each major user-agent.
Check Search Console → Coverage → Excluded → look for "Blocked by robots.txt".
Confirm Sitemap: directive points to your live sitemap (build with the sitemap generator).

The companion piece on what NOT to put in robots.txt is in noindex vs Disallow vs canonical.

FAQ

Q. Should I disallow URL parameters? A. Rarely. Canonical tags handle parameter duplicates better. Use Disallow for parameters only when crawl budget is genuinely strained.

Q. Does Disallow speed up crawl? A. Indirectly — it focuses Googlebot on indexable URLs. For most sites this isn't a problem worth solving.

Q. Can I have an empty robots.txt? A. Yes — it means "allow everything". But an explicit User-agent: * / Allow: / with a Sitemap: line is clearer and harder to misread.

Try the free tool

robots.txt Tester

Paste robots.txt + URL — see allow / disallow result per user-agent.

Open robots.txt Tester →

robots.txt mistakes that silently hide your site from Google

robots.txt Tester

Mistake 1: leftover staging block

Mistake 2: wildcard typo

Mistake 3: blocked CSS / JS

Mistake 4: Disallow ≠ noindex

Audit checklist

FAQ

robots.txt Tester

Related guides