简体   繁体   English

如何在不丢失信息的情况下从谷歌下载图像以及如何使用 Python 使用枕头模块读取它们

[英]How to download image from google without loosing information and how to read them using pillow module using Python

I am facing some issue while downloading image from the google search page then saving the same image into disk.从谷歌搜索页面下载图像然后将相同的图像保存到磁盘时,我遇到了一些问题。 I am also facing issue while reading the same image.我在阅读相同的图像时也面临问题。

Issue 1: (Downloading Image and saving into disk) I have used "requests" module to download the image.问题 1:(下载图像并保存到磁盘)我使用了“请求”模块来下载图像。 once the image is downloaded and if i try to open it, it is showing below error instead of actual image contents(tried all image formats like jpg, png, etc....) "It looks like we don't support this file format"下载图像后,如果我尝试打开它,它会显示以下错误而不是实际图像内容(尝试了所有图像格式,如 jpg、png 等...)“看起来我们不支持此文件格式”

Note: I have used urllib.requests module as well to download the image, in this case also i am facing same issue.注意:我也使用了 urllib.requests 模块来下载图像,在这种情况下我也面临同样的问题。

Below is the code Used:下面是使用的代码:

image_url = "https://www.google.com/imgres?imgurl=https%3A%2F%2Fi.etsystatic.com%2F16576605%2Fr%2Fil%2Fab973a%2F1811762786%2Fil_570xN.1811762786_ni8d.jpg&imgrefurl=https%3A%2F%2Fwww.etsy.com%2Flisting%2F676777770%2F8-styles-wood-acrylic-leather-endless&docid=Knls-viNHmqhZM&tbnid=WF4mlYC28VcOKM%3A&vet=10ahUKEwiB8v3NnezmAhWmzjgGHaWDCtIQMwgrKAAwAA..i&w=570&h=571&itg=1&bih=710&biw=1536&q=676777770&ved=0ahUKEwiB8v3NnezmAhWmzjgGHaWDCtIQMwgrKAAwAA&iact=mrc&uact=8"

with open(temp_file_path, "wb") as fil:
    response = requests.get(image_url, stream=True)
    response.raw.decode_content = True
    shutil.copyfileobj(response.raw, fil)
fil.close()

Issue2: (Open the downloaded image with PIL Module) Next step is to read the downloaded image and i have used "PIL" (pillow) module to do this.问题 2:(使用 PIL 模块打开下载的图像)下一步是读取下载的图像,我使用“PIL”(枕头)模块来执行此操作。 But i am facing below issue.但我面临以下问题。 "PIL.UnidentifiedImageError: cannot identify image file <_io.BufferedReader name=' path \\1.jpg'>" “PIL.UnidentifiedImageError:无法识别图像文件<_io.BufferedReader name=' path \\1.jpg'>”

Note: If i used manually downloaded images or captured images i am able to read them properly.注意:如果我使用手动下载的图像或捕获的图像,我能够正确读取它们。

Below is the code i have used:下面是我使用的代码:

from PIL import Image
img = Image.open(open(temp_file_path, "rb"))

I think this is because of bytes vs string conversion issue but i am not able to figure it out我认为这是因为字节与字符串转换问题,但我无法弄清楚

I am attaching the image which was downloaded by script using requests module for reference.我正在附上使用请求模块通过脚本下载的图像以供参考。

It would be great someone helps me....有人帮助我会很棒......

The problem that I see is that that url does not go directly to an image.我看到的问题是该网址不会直接转到图像。 I tried your code with image_url="https://i.etsystatic.com/16576605/r/il/ab973a/1811762786/il_794xN.1811762786_ni8d.jpg" and everything worked perfectly.我用image_url="https://i.etsystatic.com/16576605/r/il/ab973a/1811762786/il_794xN.1811762786_ni8d.jpg"尝试了你的代码,一切正常。

You can download images using urllib.request.urlretrieve(URL, 'your_filename.mp3/jpeg/png/whatever'您可以使用urllib.request.urlretrieve(URL, 'your_filename.mp3/jpeg/png/whatever'下载图像

import urllib.request

URL = "https://i.etsystatic.com/16576605/r/il/ab973a/1811762786/il_570xN.1811762786_ni8d.jpg"
urllib.request.urlretrieve(URL, "perfect_filename.png")

Sometimes it won't download anything because the request was sent via script (bot), and if you want to parse images from Google images or other search engines, you need to pass user-agent to request headers first, and then download the image, otherwise, the request will be blocked and it will throw an error.有时它不会下载任何东西,因为请求是通过脚本(bot)发送的,如果你想从谷歌图片或其他搜索引擎解析图片,你需要先通过user-agent请求headers ,然后下载图片, 否则请求将被阻塞并抛出错误。

Pass user-agent and download image:通过user-agent并下载图像:

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
urllib.request.install_opener(opener)

urllib.request.urlretrieve(URL, 'image_name.jpg')

Code and example in the online IDE that scrapes and download images . 在线 IDE 中用于抓取和下载图像的代码和示例


Alternatively, you can achieve this by using Google Images API from SerpApi.或者,您可以使用来自 SerpApi 的Google Images API来实现这一点。 It's a paid API with a free plan.这是一个带有免费计划的付费 API。

The difference is that you don't have to deal with scraping data from the <script> tags or figuring out how to bypass blocks from Google or other search engines since it's already done for the end-user.不同之处在于您不必处理从<script>标签中抓取数据或弄清楚如何绕过来自 Google 或其他搜索引擎的块,因为它已经为最终用户完成了。

Code to integrate:集成代码:

from serpapi import GoogleSearch
import os

params = {
  "api_key": os.getenv("API_KEY"),
  "engine": "google",
  "q": "pexels cat",
  "tbm": "isch"
}

search = GoogleSearch(params)
results = search.get_dict()

for index, image in enumerate(results['images_results']):

    print(f'Downloading {index} image...')
    
    opener=urllib.request.build_opener()
    opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
    urllib.request.install_opener(opener)

    urllib.request.urlretrieve(image['original'], f'SerpApi_Images/original_size_img_{index}.jpg')

Disclaimer, I work for SerpApi.免责声明,我为 SerpApi 工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM