Python 网页抓取：使用 urllib 时图像不完整

Question

I am trying to retrieve an image using Python and BeautifulSoup.我正在尝试使用 Python 和 BeautifulSoup 检索图像。 I managed to get the full url of the image but when I use urllib.urlretrieve(imagelink, filename) , it retrieves the image but the image is incomplete, only 3.2kb.我设法获得了图像的完整 url，但是当我使用urllib.urlretrieve(imagelink, filename) ，它检索了图像但图像不完整，只有 3.2kb。

The real images (im getting a lot of images) average around 800kb.真实图像（我得到了很多图像）平均大约 800kb。 It iterates through and downloads all the images but none of them are viewable and are all the same filesize.它遍历并下载所有图像，但没有一个图像是可见的，并且文件大小都相同。 The full image urls work fine when opened in the browser though.不过，在浏览器中打开时，完整的图像 url 可以正常工作。

Any idea what could cause such an issue?知道什么会导致这样的问题吗？ I don't think showing my code would help but here is the section where I am getting the url:我不认为显示我的代码会有所帮助，但这是我获取 url 的部分：

print imagelink
filename = imagelink.split('/')[-1]
time.sleep(5)
urllib.urlretrieve(imagelink, filename)
time.sleep(5)

Answer 1

宏杰李, requests is a wrapper for urllib.李宏杰，requests 是urllib 的封装。 As it is also a wrapper for sockets -))因为它也是套接字的包装器 -))

With urllib2 the same result can be achieved like this.使用 urllib2 可以像这样实现相同的结果。

>>> import urllib2
>>> r = urllib2.urlopen('https://i.stack.imgur.com/tkGEv.jpg?s=328&g=1')
>>> with open("/home/ziya/Pictures/so_image.jpg", "wb") as img:
...     img.write(r.read())

Answer 2

You should try requests :您应该尝试requests ：

import requests
url = 'https://i.stack.imgur.com/tkGEv.jpg?s=328&g=1'
r = requests.get(url)
with open('tkGEv.jpg', 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

Python 网页抓取：使用 urllib 时图像不完整

问题描述

2 个解决方案

解决方案1
2 2017-01-27 08:45:34

解决方案2
0 已采纳 2017-01-27 08:31:46

Python 网页抓取：使用 urllib 时图像不完整

问题描述

2 个解决方案

解决方案1 2 2017-01-27 08:45:34

解决方案2 0 已采纳 2017-01-27 08:31:46

解决方案1
2 2017-01-27 08:45:34

解决方案2
0 已采纳 2017-01-27 08:31:46