简体   繁体   English

使用python从Google搜索中抓取图片

[英]Crawl images from google search with python

I am trying to write a script in python in order to crawl images from google search. 我试图用python编写脚本,以便从Google搜索中抓取图像。 I want to track the urls of images and after that store those images to my computer. 我想跟踪图像的网址,然后将这些图像存储到我的计算机中。 I found a code to do so. 我找到了执行此操作的代码。 However it only track 60 urls. 但是,它仅跟踪60个网址。 Afterthat a timeout message appears. 之后,出现超时消息。 Is it possible to track more than 60 images? 是否可以追踪超过60张图像? My code: 我的代码:

def crawl_images(query, path):

    BASE_URL = 'https://ajax.googleapis.com/ajax/services/search/images?'\
         'v=1.0&q=' + query + '&start=%d'

    BASE_PATH = os.path.join(path, query)

    if not os.path.exists(BASE_PATH):
        os.makedirs(BASE_PATH)

    counter = 1
    urls = []
    start = 0 # Google's start query string parameter for pagination.
    while start < 60: # Google will only return a max of 56 results.
        r = requests.get(BASE_URL % start)
        for image_info in json.loads(r.text)['responseData']['results']:
            url = image_info['unescapedUrl']
            print url
            urls.append(url)
            image = urllib.URLopener()

            try:
                image.retrieve(url,"model runway/image_"+str(counter)+".jpg")   
                counter +=1
            except IOError, e:
                # Throw away some gifs...blegh.
                print 'could not save %s' % url
                continue

        print start
        start += 4 # 4 images per page.
        time.sleep(1.5)

crawl_images('model runway', '')

Have a look at the Documentation: https://developers.google.com/image-search/v1/jsondevguide 看看文档: https : //developers.google.com/image-search/v1/jsondevguide

You should get up to 64 results: 您应该获得64个结果:

Note: The Image Searcher supports a maximum of 8 result pages. 注意:“图像搜索器”最多支持8个结果页面。 When combined with subsequent requests, a maximum total of 64 results are available. 当与后续请求结合使用时,最多可获得64个结果。 It is not possible to request more than 64 results. 请求的结果不能超过64个。

Another note: You can restrict the file type, this way you dont need to ignore gifs etc. 另一个注意事项:您可以限制文件类型,这样就无需忽略gif等。


And as an additional Note, please keep in mind that this API should only be used for user operations and not for automated searches! 另外请注意,此API仅应用于用户操作,而不能用于自动搜索!

Note: The Google Image Search API must be used for user-generated searches. 注意:必须将Google Image Search API用于用户生成的搜索。 Automated or batched queries of any kind are strictly prohibited. 严格禁止任何形式的自动或批量查询。

You can try the icrawler package. 您可以尝试使用icrawler软件包。 Extremely easy to use. 极其易于使用。 I've never had problems with the number of images to be downloaded. 我从未遇到过要下载的图像数量的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM