简体   繁体   中英

What do I do if I don't want my website to be indexed by search engines?

为了防止您的网页被搜索引擎编入索引,您必须在HTML中放置什么标记?

Add this to the HTML <head> element of the pages you'd like not to index:

<meta name="robots" content="noindex, nofollow">

To cover the entire site, create a robots.txt on the root folder which contains the following lines:

User-agent: *
Disallow: /

See also:

使用robots.txt文件来限制索引: http//www.robotstxt.org/orig.html

The other answers here are subtly wrong. Unfortunately the answer is a good deal more complicated.

Some search engines support the HTML noindex tag, but not all of them do. In particular, Bing and Google do, but a bunch of others don't ( here's my research on this ). Depending on whether a search engine supports noindex, you have to take a different approach.

For those that support noindex (Google, Bing)..

For these you need to include the noindex tag in your HTML like this:

<meta name="robots" content="noindex, noodp, noarchive, noimageindex" />

Note that there are other "no-" things in there as well. I'll leave looking those up as an exercise to the reader.

In addition to this, you must not block Google and Bing in your robots.txt file, or else they'll never see your noindex meta tag and it will be useless. This is important because Google and Bing consider noindex to mean "do not show this result at all, ever" while a link blocked by robots.txt means "if somebody links here, you can show it, but don't ever crawl it." There's the rub: If Google or Bing knows about a page that's blocked by robots, they'll show it in their results without knowing its content and without ever crawling it. That is why you must not block Google and Bing with robots, and must instead block them with noindex.

For those that do not support noindex (Internet Archive, Alexa, Blekko, Baidu)...

These, you must simply block in your robots.txt file. You can include the noindex tag as well, but it will have no effect since the page will never get crawled.

Bonus section

  1. If you want bonus points, you should set up sitemap.xml files for Google and Bing so they can discover your content as quickly as possible (and then block it!).
  2. If you have binary content (like pictures, pdfs, etc), you'll need to block those using the x-robots HTTP header. See my blog post for more details!

Why this is my personal project to write long answers like this...

I run a site with about 7M legal documents. Some have personal info in them and cannot be in search engines. I've studied this more than any person ever should and it's frustrating that the robots.txt myth is so strong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM