简体   繁体   English

如何保护网站免受(谷歌)缓存?

[英]How to protect a site from (google) caching?

I would like to hide some content from public (like google cached pages).我想向公众隐藏一些内容(如谷歌缓存页面)。 Is it possible?可能吗?

Add the following HTML tag in the <head> section of your web pages to prevent Google from showing the Cached link for a page.在 web 页面的<head>部分添加以下 HTML 标记,以防止 Google 显示页面的缓存链接。

<META NAME="ROBOTS" CONTENT="noarchive">

Check out Google webmaster central |查看Google 网站管理员中心 | Meta tags to see what other meta tags Google understands.元标签以查看 Google 理解的其他元标签。

Option 1: Disable 'Show Cached Site' Link In Google Search Results选项 1:在 Google 搜索结果中禁用“显示缓存站点”链接

If you want to prevent google from archiving your site, add the following meta tag to your section:如果您想阻止 google 归档您的网站,请将以下元标记添加到您的部分:

<meta name="robots" content="noarchive">

If your site is already cached by Google, you can request its removal using Google's URL removal tool .如果您的网站已被 Google 缓存,您可以使用Google 的 URL 删除工具请求将其删除。 For more instructions on how to use this tool, see "Remove a page or site from Google's search results" at Google Webmaster Central.有关如何使用此工具的更多说明,请参阅 Google 网站管理员中心的“从 Google 的搜索结果中删除页面或网站”

Option 2: Remove Site From Google Index Completely选项 2:从 Google 索引中完全删除网站

Warning.警告。 The following method will remove your site from Google index completely.以下方法将从 Google 索引中完全删除您的网站。 Use it only if you don't want your site to show up in Google results.仅当您不希望自己的网站出现在 Google 结果中时才使用它。

To prevent ("protect") your site from getting to Google's cache, you can use robots.txt .要防止(“保护”)您的网站访问 Google 的缓存,您可以使用robots.txt For instructions on how to use this file, see "Block or remove pages using a robots.txt file" .有关如何使用此文件的说明,请参阅“使用 robots.txt 文件阻止或删除页面”

In principle, you need to create a file named robots.txt and serve it from your site's root folder ( /robots.txt ).原则上,您需要创建一个名为robots.txt的文件,并从站点的根文件夹 ( /robots.txt ) 提供它。 Sample file content:示例文件内容:

User-agent: *
Disallow: /folder1/

User-Agent: Googlebot
Disallow: /folder2/

In addition, consider setting robots meta tag in your HTML document to noindex ( "Using meta tags to block access to your site" ):此外,考虑将 HTML 文档中的robots元标记设置为noindex“使用元标记阻止访问您的站点” ):

  • To prevent all robots from indexing your site, set <meta name="robots" content="noindex">要防止所有机器人将您的网站编入索引,请设置<meta name="robots" content="noindex">
  • To selectively block only Google , set <meta name="googlebot" content="noindex">选择性地仅阻止 Google ,请设置<meta name="googlebot" content="noindex">

Finally, make sure that your settings really work, for instance with Google Webmaster Tools .最后,确保您的设置确实有效,例如使用Google 网站管理员工具

You can use a robots.txt file to request that your page is not indexed.您可以使用robots.txt文件请求不将您的网页编入索引。 Google and other reputable services will adhere to this, but not all do.谷歌和其他信誉良好的服务将遵守这一点,但并非所有人都这样做。

The only way to make sure that your site content isn't indexed or cached by any search engine or similar service is to prevent access to the site unless the user has a password.确保您的网站内容不被任何搜索引擎或类似服务索引或缓存的唯一方法是阻止访问该网站,除非用户有密码。

This is most easily achieved using HTTP Basic Auth .使用HTTP Basic Auth最容易实现这一点。 If you're using the Apache web server, there are lots of tutorials ( example ) on how to configure this.如果您使用的是 Apache web 服务器,有很多关于如何配置的教程(示例)。 A good search term to use is htpasswd .一个很好的搜索词是htpasswd

A simple way to do this would be with a <meta name="robots" content="noarchive"/>一个简单的方法是使用<meta name="robots" content="noarchive"/>

You can also achieve a similar effect with the robots.txt file.您也可以使用 robots.txt 文件实现类似的效果。

For a good explanation, see the official google blog on the robot's execution policy一个很好的解释,请参阅官方的 google 博客关于机器人的执行策略

You can also add this HTTP Header on your response, instead of needing to update the html files:您还可以在响应中添加此 HTTP Header,而无需更新 html 文件:

X-Robots-Tag: noarchive

eg for Apache:例如对于 Apache:

Header set X-Robots-Tag "noarchive"

See also: https://developers.google.com/search/reference/robots_meta_tag?csw=1另请参阅: https://developers.google.com/search/reference/robots_meta_tag?csw=1

I would like to hide some content from public....我想对公众隐藏一些内容....

Use a login system to view the content.使用登录系统查看内容。

...(like google cached pages). ...(如谷歌缓存页面)。

Configure robots.txt to deny Google bot.配置robots.txt以拒绝 Google bot。

If you want to limit who can see content, secure it behind some form of authentication mechanism (eg password protection, even if it is just HTTP Basic Auth).如果您想限制谁可以看到内容,请在某种形式的身份验证机制后面保护它(例如密码保护,即使它只是 HTTP 基本身份验证)。

The specifics of how to implement that would depend on the options provided by your server.如何实现的细节取决于您的服务器提供的选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM