What do I do if I don't want my website to be indexed by search engines?

Question

为了防止您的网页被搜索引擎编入索引，您必须在HTML中放置什么标记？

Answer 1

Add this to the HTML <head> element of the pages you'd like not to index:

<meta name="robots" content="noindex, nofollow">

To cover the entire site, create a robots.txt on the root folder which contains the following lines:

User-agent: *
Disallow: /

For those that support noindex (Google, Bing)..

For these you need to include the noindex tag in your HTML like this:

<meta name="robots" content="noindex, noodp, noarchive, noimageindex" />

Note that there are other "no-" things in there as well. I'll leave looking those up as an exercise to the reader.

In addition to this, you must not block Google and Bing in your robots.txt file, or else they'll never see your noindex meta tag and it will be useless. This is important because Google and Bing consider noindex to mean "do not show this result at all, ever" while a link blocked by robots.txt means "if somebody links here, you can show it, but don't ever crawl it." There's the rub: If Google or Bing knows about a page that's blocked by robots, they'll show it in their results without knowing its content and without ever crawling it. That is why you must not block Google and Bing with robots, and must instead block them with noindex.

For those that do not support noindex (Internet Archive, Alexa, Blekko, Baidu)...

These, you must simply block in your robots.txt file. You can include the noindex tag as well, but it will have no effect since the page will never get crawled.

Bonus section

If you want bonus points, you should set up sitemap.xml files for Google and Bing so they can discover your content as quickly as possible (and then block it!).
If you have binary content (like pictures, pdfs, etc), you'll need to block those using the x-robots HTTP header. See my blog post for more details!

Why this is my personal project to write long answers like this...

I run a site with about 7M legal documents. Some have personal info in them and cannot be in search engines. I've studied this more than any person ever should and it's frustrating that the robots.txt myth is so strong.

What do I do if I don't want my website to be indexed by search engines?

Question

3 answers

solution1
12 ACCPTED 2010-08-10 00:46:12

See also:

solution2
7 2010-08-10 00:46:09

solution3
1 2014-06-13 16:53:44

For those that support noindex (Google, Bing)..

For those that do not support noindex (Internet Archive, Alexa, Blekko, Baidu)...

Bonus section

Why this is my personal project to write long answers like this...

What do I do if I don't want my website to be indexed by search engines?

Question

3 answers

solution1 12 ACCPTED 2010-08-10 00:46:12

See also:

solution2 7 2010-08-10 00:46:09

solution3 1 2014-06-13 16:53:44

For those that support noindex (Google, Bing)..

For those that do not support noindex (Internet Archive, Alexa, Blekko, Baidu)...

Bonus section

Why this is my personal project to write long answers like this...

solution1
12 ACCPTED 2010-08-10 00:46:12

solution2
7 2010-08-10 00:46:09

solution3
1 2014-06-13 16:53:44