简体   繁体   English

使用wget -c功能在Python中使用urllib下载文件

[英]Download file using urllib in Python with the wget -c feature

I am programming a software in Python to download HTTP PDF from a database. 我正在使用Python编程软件以从数据库下载HTTP PDF。 Sometimes the download stop with this message : 有时下载会停止并显示以下消息:

retrieval incomplete: got only 3617232 out of 10689634 bytes

How can I ask the download to restart where it stops using the 206 Partial Content HTTP feature ? 如何使用206 Partial Content HTTP功能要求下载重新启动?

I can do it using wget -c and it works pretty well, but I would like to implement it directly in my Python software. 我可以使用wget -c来做它并且它工作得很好,但我想直接在我的Python软件中实现它。

Any idea ? 任何的想法 ?

Thank you 谢谢

You can request a partial download by sending a GET with the Range header: 您可以通过发送带有Range标头的GET来请求部分下载:

import urllib2
req = urllib2.Request('http://www.python.org/')
#
# Here we request that bytes 18000--19000 be downloaded.
# The range is inclusive, and starts at 0.
#
req.headers['Range'] = 'bytes=%s-%s' % (18000, 19000)
f = urllib2.urlopen(req)
# This shows you the *actual* bytes that have been downloaded.
range=f.headers.get('Content-Range')
print(range)
# bytes 18000-18030/18031
print(repr(f.read()))
# '  </div>\n</body>\n</html>\n\n\n\n\n\n\n'

Be careful to check the Content-Range to learn what bytes have actually been downloaded, since your range may be out of bounds, and/or not all servers seem to respect the Range header. 请注意检查Content-Range以了解实际下载的字节数,因为您的范围可能超出范围,并且/或者并非所有服务器都看起来都遵循Range标头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM