robots.txt · meta robots · X-Robots-Tag
Which robots directive does Google actually obey?
Three places decide whether a URL is crawled and indexed — robots.txt, the
meta robots tag and the X-Robots-Tag header — and they often disagree. xrobotscheck reads all
three for the crawler you pick and shows a per-directive table of which one wins, plus the trap of
“noindex behind a Disallow.”
The conflicts that break indexing
A noindex on a page robots.txt also blocks — Google never crawls it, never sees the noindex, and can still index the URL.
For crawl, noindex and nofollow: each source side by side plus the effective result.
When the meta tag and X-Robots-Tag disagree, the most restrictive wins. We show which.
Toggle the crawler — robots.txt groups and bot-specific directives are evaluated per crawler.
Why it matters
“My page is noindex but still indexed” is one of the most common SEO confusions — almost always the noindex-behind-Disallow trap. robots.txt only controls crawling; a blocked page can still appear as a URL-only result. xrobotscheck resolves all three signals with Google's real precedence rules — open methodology, no black-box score — and gives you a shareable result page.
Frequently asked questions
What does xrobotscheck do?
A page can be told what to do by three different mechanisms: robots.txt (controls crawling), the meta robots tag, and the X-Robots-Tag HTTP header (both control indexing). They frequently disagree. xrobotscheck fetches a URL and its robots.txt, reads all three for the crawler you pick (Googlebot or Bingbot), and shows a per-directive table of which one wins and why — plus it names the classic traps.
Why is my page noindex but still indexed?
The usual cause is the "noindex behind Disallow" trap: your page has a noindex, but robots.txt also blocks crawling of it. Because the crawler is not allowed to fetch the page, it never sees the noindex — so Google can still index the URL from external links (as a URL-only result). The fix is counter-intuitive: ALLOW crawling so the noindex can be seen, then block again once it has dropped out. xrobotscheck detects this exact case.
Which directive does Google obey when they conflict?
robots.txt decides whether the page is crawled at all — if it is blocked, the meta robots and X-Robots-Tag are never seen. If the page is crawlable, Google combines the meta tag and the header and applies the most restrictive directive (so if either says noindex, the result is noindex). A directive aimed at a specific crawler (e.g. a googlebot meta tag, or "googlebot:" in the header) overrides the generic one for that crawler.
Does robots.txt remove a page from Google?
No. robots.txt only controls crawling. A Disallowed URL can still appear in search results (without a description) if other pages link to it. To actually remove a page, it must be crawlable and carry a noindex.
What is the difference between meta robots and X-Robots-Tag?
They carry the same indexing directives (noindex, nofollow, nosnippet, etc.). The meta robots tag lives in the HTML <head>, so it only works for HTML pages; the X-Robots-Tag is an HTTP response header, so it also works for PDFs, images and other non-HTML files. When both are present, the most restrictive wins.
Can I check Googlebot vs Bingbot separately?
Yes — toggle the crawler. The robots.txt group, the meta tag name (googlebot / bingbot), and the X-Robots-Tag bot prefix are all evaluated for the crawler you choose, because a site can give different instructions to each.
Is my data safe?
The check runs on Cloudflare and only fetches the public URL and its robots.txt; requests to private, loopback, link-local and cloud-metadata addresses are blocked, redirects are re-validated, and responses are size- and time-capped. We keep no logs of the URLs you check.