简体   繁体   English

Google抓取导致API速率限制

[英]Google Crawl Causing API rate limit

Currently google crawl has been crawling my site around 10 times a second, which is causing some of instagrams API's - im using - to reach their hourly rate limits pretty quick. 目前谷歌抓取已经每秒爬行我的网站大约10次,这导致一些instagrams API - 我正在使用 - 很快达到他们的小时速率限制。

Is their a way prevent google crawl run a piece of php code? 他们是一种阻止谷歌抓取运行一段PHP代码的方式吗? I still want them to crawl the pages but not trigger the api requests. 我仍然希望他们抓取页面但不会触发api请求。

Since you want, that the Page is still crawled, robots.txt could be no option for you. 既然你想要,仍然抓取页面, robots.txt可能不适合你。

Generaly, you shoud ask, is your implementation of the API right? 一般来说,你问一下,你的API实现是对的吗? You should use an API to obtain some data or perform some operation. 您应该使用API​​来获取某些数据或执行某些操作。

What you should not do, is asking the API every PageView the same Information. 你不应该做的是,每个PageView都要求API提供相同的信息。 Cache it instead. 请改为缓存它。

Sometimes it is ok, to cache simple the result in a txt file, Sometimes you want to crawl data into your own Database. 有时可以将结果简单地缓存到txt文件中,有时您希望将数据爬网到自己的数据库中。

If this is no option for you, you can detect the google bot this way: 如果这不是您的选择,您可以通过这种方式检测谷歌机器人

if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
    // what to do
}

Give at least the Googlebot a cached version. 至少为Googlebot提供缓存版本。


Also note, this is not a Googlebot only Problem. 另请注意,这不是Googlebot唯一的问题。 There are many bots out there. 那里有很多机器人。 And there are also bad Bots, which pose as a normal User. 还有一些糟糕的机器人,它们构成普通用户。 Also If you have heavy load, this could be a problem too. 此外,如果您负载很重,这也可能是一个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM