简体   繁体   English

在谷歌上批量搜索:403错误

[英]Batch searching on google : 403 error

I am trying to do batch searching and go over a list of strings and print the first address that google search returns: 我正在尝试批量搜索并查看字符串列表并打印谷歌搜索返回的第一个地址:

#!/usr/bin/python
import json
import urllib
import time
import pandas as pd

df = pd.read_csv("test.csv")
saved_column = df.Name #you can also use df['column_name']

for name in saved_column:
  query = urllib.urlencode({'q': name})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.urlopen(url)
  search_results = search_response.read()
  results = json.loads(search_results)
  data = results['responseData']

  address = data[u'results'][0][u'url']

  print address

I get a 403 error from the server: 'Suspected Terms of Service Abuse. 我从服务器收到403错误:'疑似服务滥用条款。 Please see http://code.google.com/apis/errors ', u'responseStatus': 403 请参阅http://code.google.com/apis/errors',u'responseStatus':403

Is what I'm doing is not allowed according to google's terms of service? 根据谷歌的服务条款,我正在做的是不允许的?

I also tried to put time.sleep(5) in the loop but I get the same error. 我也尝试将time.sleep(5)放在循环中,但是我得到了同样的错误。

Thank you in advance 先感谢您

Not allowed by Google TOS. Google服务条款不允许。 You really can't scrape google without them getting angry. 如果没有他们生气,你真的不能刮刮谷歌。 It's also a pretty sophisticated blocker, so you can get around for a little while with random delays, but it fails pretty quickly. 它也是一个非常复杂的拦截器,所以你可以随机延迟一段时间,但它很快就失败了。

Sorry, you're out of luck on this one. 对不起,你在这个上运气不好。

https://developers.google.com/errors/?csw=1 https://developers.google.com/errors/?csw=1

The Google Search and Language APIs shown to the right have been officially deprecated. 右侧显示的Google搜索和语言API已被正式弃用。

Also

We received automated requests, such as scraping and prefetching. 我们收到了自动请求,例如抓取和预取。 Automated requests are prohibited; 禁止自动请求; all requests must be made as a result of an end-user action. 所有请求必须是最终用户操作的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM