简体   繁体   English

为什么我不能用python从谷歌下载图片?

[英]why couldn't I download images from google with python?

The code helped me download bunch of images from google.代码帮助我从谷歌下载了一堆图片。 It used to work a few days back and now all of the sudden the code breaks.几天前它曾经可以工作,现在突然代码中断了。

Code :代码 :

# importing google_images_download module 
from google_images_download import google_images_download  

# creating object 
response = google_images_download.googleimagesdownload()  

search_queries = ['Apple', 'Orange', 'Grapes', 'water melon'] 


def downloadimages(query): 
    # keywords is the search query 
    # format is the image file format 
    # limit is the number of images to be downloaded 
    # print urs is to print the image file url 
    # size is the image size which can 
    # be specified manually ("large, medium, icon") 
    # aspect ratio denotes the height width ratio 
    # of images to download. ("tall, square, wide, panoramic") 
    arguments = {"keywords": query, 
                 "format": "jpg", 
                 "limit":4, 
                 "print_urls":True, 
                 "size": "medium", 
                 "aspect_ratio": "panoramic"} 
    try: 
        response.download(arguments) 

    # Handling File NotFound Error     
    except FileNotFoundError:  
        arguments = {"keywords": query, 
                     "format": "jpg", 
                     "limit":4, 
                     "print_urls":True,  
                     "size": "medium"} 

        # Providing arguments for the searched query 
        try: 
            # Downloading the photos based 
            # on the given arguments 
            response.download(arguments)  
        except: 
            pass

# Driver Code 
for query in search_queries: 
    downloadimages(query)  
    print()

Output log:输出日志:

Item no.: 1 --> Item name = Apple Evaluating... Starting Download...项目编号:1 --> 项目名称 = Apple 正在评估...开始下载...

Unfortunately all 4 could not be downloaded because some images were not downloadable.不幸的是,由于某些图像无法下载,因此无法下载所有 4 个。 0 is all we got for this search filter! 0 是我们为这个搜索过滤器得到的全部!

Errors: 0错误:0

Item no.: 1 --> Item name = Orange Evaluating... Starting Download...项目编号:1 --> 项目名称 = 橙色 正在评估...开始下载...

Unfortunately all 4 could not be downloaded because some images were not downloadable.不幸的是,由于某些图像无法下载,因此无法下载所有 4 个。 0 is all we got for this search filter! 0 是我们为这个搜索过滤器得到的全部!

Errors: 0错误:0

Item no.: 1 --> Item name = Grapes Evaluating... Starting Download...项目编号:1 --> 项目名称 = 葡萄 正在评估...开始下载...

Unfortunately all 4 could not be downloaded because some images were not downloadable.不幸的是,由于某些图像无法下载,因此无法下载所有 4 个。 0 is all we got for this search filter! 0 是我们为这个搜索过滤器得到的全部!

Errors: 0错误:0

Item no.: 1 --> Item name = water melon Evaluating... Starting Download...项目编号:1 --> 项目名称 = 西瓜 正在评估...开始下载...

Unfortunately all 4 could not be downloaded because some images were not downloadable.不幸的是,由于某些图像无法下载,因此无法下载所有 4 个。 0 is all we got for this search filter! 0 是我们为这个搜索过滤器得到的全部!

Errors: 0错误:0

This actually create a folder but no images in it.这实际上创建了一个文件夹,但其中没有图像。

google_images_download project is no longer seems compatible wrt Google APIs. google_images_download项目似乎不再与 Google API 兼容。

As an alternative you can try simple_image_download .作为替代方案,您可以尝试simple_image_download

It looks like there is an issue with the package.看起来包裹有问题。 See these open PRs:PR1 andPR2查看这些公开的 PR:PR1PR2

I think Google is changing the DOM.我认为 Google 正在改变 DOM。 The element class="rg_meta notranslate" is no longer exist.元素 class="rg_meta notranslate" 不再存在。 It is changed to class="rg_i ..."改为 class="rg_i ..."


def get_soup(url,header):
    return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser')    

def main(args):
    query = "typical face"
    query = query.split()
    query = '+'.join(query)
    url = "https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
    soup = get_soup(url, headers)
    for a in soup.find_all("img", {"class": "rg_i"}):
        wget.download(a.attrs["data-iurl"], a.attrs["data-iid"])


if __name__ == '__main__':
    from sys import argv
    try:
        main(argv)
    except KeyboardInterrupt:
        pass
    sys.exit()

Indeed the issue has appeared not so long ago, there are already a bunch of similar Github issues:确实这个问题不久前就出现了,Github上已经有一堆类似的问题了:

Unfortunately, there is no official solution, for now, you could use the temporary solution that was provided in the discussions.不幸的是,目前还没有官方解决方案,您可以使用讨论中提供的临时解决方案。

The reason this doesn't work is because google changed the way they do everything so that you now need the api_key included in the search string.这不起作用的原因是因为谷歌改变了他们做所有事情的方式,所以你现在需要包含在搜索字符串中的 api_key 。 As a result of this packages such as google-images-download no longer work even if you use the 2.8.0 version because they have no placeholder to insert the api_key string which you must register with Google to get your 2500 free downloads per day.因此,即使您使用 2.8.0 版本,诸如 google-images-download 之类的软件包也不再起作用,因为它们没有占位符来插入 api_key 字符串,您必须向 Google 注册才能获得每天 2500 次免费下载。

If you are willing to pay $50 per month or more to access a service from serpapi.com , one way to do this is to use the pip package google-search-results and provide your api_key as part of the query params.如果您愿意每月支付 50 美元或更多以访问来自serpapi.com的服务,一种方法是使用 pip 包google-search-results并提供您的 api_key 作为查询参数的一部分。

params = {
           "engine" : "google",
           ...
           "api_key" : "secret_api_key" 
}

where you provide your API key yourself and then call:您自己提供 API 密钥,然后调用:

client = GoogleSearchResults(params)
results = client.get_dict()

This returns a JSON string with the link to all the image urls and then you just download them directly.这将返回一个带有所有图像 URL 链接的 JSON 字符串,然后您只需直接下载它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM