简体   繁体   English

Python 网页抓取:使用 urllib 时图像不完整

[英]Python webscraping: Image incomplete when using urllib

I am trying to retrieve an image using Python and BeautifulSoup.我正在尝试使用 Python 和 BeautifulSoup 检索图像。 I managed to get the full url of the image but when I use urllib.urlretrieve(imagelink, filename) , it retrieves the image but the image is incomplete, only 3.2kb.我设法获得了图像的完整 url,但是当我使用urllib.urlretrieve(imagelink, filename) ,它检索了图像但图像不完整,只有 3.2kb。

The real images (im getting a lot of images) average around 800kb.真实图像(我得到了很多图像)平均大约 800kb。 It iterates through and downloads all the images but none of them are viewable and are all the same filesize.它遍历并下载所有图像,但没有一个图像是可见的,并且文件大小都相同。 The full image urls work fine when opened in the browser though.不过,在浏览器中打开时,完整的图像 url 可以正常工作。

Any idea what could cause such an issue?知道什么会导致这样的问题吗? I don't think showing my code would help but here is the section where I am getting the url:我不认为显示我的代码会有所帮助,但这是我获取 url 的部分:

print imagelink
filename = imagelink.split('/')[-1]
time.sleep(5)
urllib.urlretrieve(imagelink, filename)
time.sleep(5)

宏杰李, requests is a wrapper for urllib.李宏杰,requests 是urllib 的封装。 As it is also a wrapper for sockets -))因为它也是套接字的包装器 -))

With urllib2 the same result can be achieved like this.使用 urllib2 可以像这样实现相同的结果。

>>> import urllib2
>>> r = urllib2.urlopen('https://i.stack.imgur.com/tkGEv.jpg?s=328&g=1')
>>> with open("/home/ziya/Pictures/so_image.jpg", "wb") as img:
...     img.write(r.read())

在此处输入图片说明

You should try requests :您应该尝试requests

import requests
url = 'https://i.stack.imgur.com/tkGEv.jpg?s=328&g=1'
r = requests.get(url)
with open('tkGEv.jpg', 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM