robots.txt · meta robots · X-Robots-Tag

Which robots directive
does Google actually obey?

Three places decide whether a URL is crawled and indexed — robots.txt, the meta robots tag and the X-Robots-Tag header — and they often disagree. xrobotscheck reads all three for the crawler you pick and shows a per-directive table of which one wins, plus the notorious traps like “noindex behind a Disallow.”

Try:

What it checks

The conflicts that break indexing

noindex behind Disallow

The #1 trap: a noindex on a page that robots.txt also blocks. Google never crawls it, never sees the noindex, and can still index the URL.

Per-directive decision table

For crawl, noindex and nofollow, see the value from each source side by side and the effective result — not just a yes/no.

meta vs header

When the meta tag and the X-Robots-Tag header disagree, Google applies the most restrictive. We show which one wins.

Googlebot vs Bingbot

Toggle the crawler — robots.txt groups and bot-specific meta/header directives are evaluated for the one you choose.

Disallow ≠ removal

Blocking crawling alone does not de-index a page. We flag that and explain the correct removal path.

Shareable result

A real, linkable result page you can drop into a ticket — unlike a browser extension that only you can see.

Open methodology

Every rule, in the open

No mystery score. Here is exactly how each verdict is decided — so you can verify and trust the result.

Related: to check whether AI crawlers can reach a page see aicrawlcheck; for soft 404s see soft404scan.

Frequently asked questions

What does xrobotscheck do?

A page can be told what to do by three different mechanisms: robots.txt (controls crawling), the meta robots tag, and the X-Robots-Tag HTTP header (both control indexing). They frequently disagree. xrobotscheck fetches a URL and its robots.txt, reads all three for the crawler you pick (Googlebot or Bingbot), and shows a per-directive table of which one wins and why — plus it names the classic traps.

Why is my page noindex but still indexed?

The usual cause is the "noindex behind Disallow" trap: your page has a noindex, but robots.txt also blocks crawling of it. Because the crawler is not allowed to fetch the page, it never sees the noindex — so Google can still index the URL from external links (as a URL-only result). The fix is counter-intuitive: ALLOW crawling so the noindex can be seen, then block again once it has dropped out. xrobotscheck detects this exact case.

Which directive does Google obey when they conflict?

robots.txt decides whether the page is crawled at all — if it is blocked, the meta robots and X-Robots-Tag are never seen. If the page is crawlable, Google combines the meta tag and the header and applies the most restrictive directive (so if either says noindex, the result is noindex). A directive aimed at a specific crawler (e.g. a googlebot meta tag, or "googlebot:" in the header) overrides the generic one for that crawler.

Does robots.txt remove a page from Google?

No. robots.txt only controls crawling. A Disallowed URL can still appear in search results (without a description) if other pages link to it. To actually remove a page, it must be crawlable and carry a noindex.

What is the difference between meta robots and X-Robots-Tag?

They carry the same indexing directives (noindex, nofollow, nosnippet, etc.). The meta robots tag lives in the HTML <head>, so it only works for HTML pages; the X-Robots-Tag is an HTTP response header, so it also works for PDFs, images and other non-HTML files. When both are present, the most restrictive wins.

Can I check Googlebot vs Bingbot separately?

Yes — toggle the crawler. The robots.txt group, the meta tag name (googlebot / bingbot), and the X-Robots-Tag bot prefix are all evaluated for the crawler you choose, because a site can give different instructions to each.

Is my data safe?

The check runs on Cloudflare and only fetches the public URL and its robots.txt; requests to private, loopback, link-local and cloud-metadata addresses are blocked, redirects are re-validated, and responses are size- and time-capped. We keep no logs of the URLs you check.