robots.txt · meta robots · X-Robots-Tag
Which robots directive
does Google actually obey?
Three places decide whether a URL is crawled and indexed — robots.txt, the
meta robots tag and the X-Robots-Tag header — and they often disagree.
xrobotscheck reads all three for the crawler you pick and shows a per-directive table of which one
wins, plus the notorious traps like “noindex behind a Disallow.”
What it checks
The conflicts that break indexing
noindex behind Disallow
The #1 trap: a noindex on a page that robots.txt also blocks. Google never crawls it, never sees the noindex, and can still index the URL.
Per-directive decision table
For crawl, noindex and nofollow, see the value from each source side by side and the effective result — not just a yes/no.
meta vs header
When the meta tag and the X-Robots-Tag header disagree, Google applies the most restrictive. We show which one wins.
Googlebot vs Bingbot
Toggle the crawler — robots.txt groups and bot-specific meta/header directives are evaluated for the one you choose.
Disallow ≠ removal
Blocking crawling alone does not de-index a page. We flag that and explain the correct removal path.
Shareable result
A real, linkable result page you can drop into a ticket — unlike a browser extension that only you can see.
Open methodology
Every rule, in the open
No mystery score. Here is exactly how each verdict is decided — so you can verify and trust the result.
- FetchTwo requests for the chosen crawler: the URL itself (for the meta robots tag, the X-Robots-Tag response header and the final status) and /robots.txt.
- Crawl (robots.txt)Matched per Google’s rules: most-specific user-agent group, longest matching path, Allow wins ties, * and $ wildcards. robots.txt controls crawling only, never indexing.
- noindex behind Disallow (trap)FAIL — a noindex (meta or header) on a page that robots.txt Disallows is never seen by the crawler, so the URL can still be indexed. The decisive rule: a blocked page’s on-page directives are invisible.
- Most-restrictive winsWhen the page is crawlable, meta robots and X-Robots-Tag are combined and the most restrictive directive applies (either saying noindex ⇒ noindex).
- Bot-specific overrides genericA directive aimed at the chosen crawler (a googlebot meta tag, or a "googlebot:" prefix in X-Robots-Tag) overrides the generic robots directive for that crawler.
- Disallow ≠ removalA robots.txt Disallow without a noindex is WARN, not removal — URL-only results can still appear; the fix is to allow crawling and add noindex.
Related: to check whether AI crawlers can reach a page see aicrawlcheck; for soft 404s see soft404scan.
Frequently asked questions
What does xrobotscheck do?
A page can be told what to do by three different mechanisms: robots.txt (controls crawling), the meta robots tag, and the X-Robots-Tag HTTP header (both control indexing). They frequently disagree. xrobotscheck fetches a URL and its robots.txt, reads all three for the crawler you pick (Googlebot or Bingbot), and shows a per-directive table of which one wins and why — plus it names the classic traps.
Why is my page noindex but still indexed?
The usual cause is the "noindex behind Disallow" trap: your page has a noindex, but robots.txt also blocks crawling of it. Because the crawler is not allowed to fetch the page, it never sees the noindex — so Google can still index the URL from external links (as a URL-only result). The fix is counter-intuitive: ALLOW crawling so the noindex can be seen, then block again once it has dropped out. xrobotscheck detects this exact case.
Which directive does Google obey when they conflict?
robots.txt decides whether the page is crawled at all — if it is blocked, the meta robots and X-Robots-Tag are never seen. If the page is crawlable, Google combines the meta tag and the header and applies the most restrictive directive (so if either says noindex, the result is noindex). A directive aimed at a specific crawler (e.g. a googlebot meta tag, or "googlebot:" in the header) overrides the generic one for that crawler.
Does robots.txt remove a page from Google?
No. robots.txt only controls crawling. A Disallowed URL can still appear in search results (without a description) if other pages link to it. To actually remove a page, it must be crawlable and carry a noindex.
What is the difference between meta robots and X-Robots-Tag?
They carry the same indexing directives (noindex, nofollow, nosnippet, etc.). The meta robots tag lives in the HTML <head>, so it only works for HTML pages; the X-Robots-Tag is an HTTP response header, so it also works for PDFs, images and other non-HTML files. When both are present, the most restrictive wins.
Can I check Googlebot vs Bingbot separately?
Yes — toggle the crawler. The robots.txt group, the meta tag name (googlebot / bingbot), and the X-Robots-Tag bot prefix are all evaluated for the crawler you choose, because a site can give different instructions to each.
Is my data safe?
The check runs on Cloudflare and only fetches the public URL and its robots.txt; requests to private, loopback, link-local and cloud-metadata addresses are blocked, redirects are re-validated, and responses are size- and time-capped. We keep no logs of the URLs you check.