简体   繁体   中英

Google Crawl Causing API rate limit

Currently google crawl has been crawling my site around 10 times a second, which is causing some of instagrams API's - im using - to reach their hourly rate limits pretty quick.

Is their a way prevent google crawl run a piece of php code? I still want them to crawl the pages but not trigger the api requests.

Since you want, that the Page is still crawled, robots.txt could be no option for you.

Generaly, you shoud ask, is your implementation of the API right? You should use an API to obtain some data or perform some operation.

What you should not do, is asking the API every PageView the same Information. Cache it instead.

Sometimes it is ok, to cache simple the result in a txt file, Sometimes you want to crawl data into your own Database.

If this is no option for you, you can detect the google bot this way:

if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
    // what to do
}

Give at least the Googlebot a cached version.


Also note, this is not a Googlebot only Problem. There are many bots out there. And there are also bad Bots, which pose as a normal User. Also If you have heavy load, this could be a problem too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM