[英]Crawl images from google search with python
I am trying to write a script in python in order to crawl images from google search. 我试图用python编写脚本,以便从Google搜索中抓取图像。 I want to track the urls of images and after that store those images to my computer. 我想跟踪图像的网址,然后将这些图像存储到我的计算机中。 I found a code to do so. 我找到了执行此操作的代码。 However it only track 60 urls. 但是,它仅跟踪60个网址。 Afterthat a timeout message appears. 之后,出现超时消息。 Is it possible to track more than 60 images? 是否可以追踪超过60张图像? My code: 我的代码:
def crawl_images(query, path):
BASE_URL = 'https://ajax.googleapis.com/ajax/services/search/images?'\
'v=1.0&q=' + query + '&start=%d'
BASE_PATH = os.path.join(path, query)
if not os.path.exists(BASE_PATH):
os.makedirs(BASE_PATH)
counter = 1
urls = []
start = 0 # Google's start query string parameter for pagination.
while start < 60: # Google will only return a max of 56 results.
r = requests.get(BASE_URL % start)
for image_info in json.loads(r.text)['responseData']['results']:
url = image_info['unescapedUrl']
print url
urls.append(url)
image = urllib.URLopener()
try:
image.retrieve(url,"model runway/image_"+str(counter)+".jpg")
counter +=1
except IOError, e:
# Throw away some gifs...blegh.
print 'could not save %s' % url
continue
print start
start += 4 # 4 images per page.
time.sleep(1.5)
crawl_images('model runway', '')
Have a look at the Documentation: https://developers.google.com/image-search/v1/jsondevguide 看看文档: https : //developers.google.com/image-search/v1/jsondevguide
You should get up to 64 results: 您应该获得64个结果:
Note: The Image Searcher supports a maximum of 8 result pages. 注意:“图像搜索器”最多支持8个结果页面。 When combined with subsequent requests, a maximum total of 64 results are available. 当与后续请求结合使用时,最多可获得64个结果。 It is not possible to request more than 64 results. 请求的结果不能超过64个。
Another note: You can restrict the file type, this way you dont need to ignore gifs etc. 另一个注意事项:您可以限制文件类型,这样就无需忽略gif等。
And as an additional Note, please keep in mind that this API should only be used for user operations and not for automated searches! 另外请注意,此API仅应用于用户操作,而不能用于自动搜索!
Note: The Google Image Search API must be used for user-generated searches. 注意:必须将Google Image Search API用于用户生成的搜索。 Automated or batched queries of any kind are strictly prohibited. 严格禁止任何形式的自动或批量查询。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.