简体   繁体   English

使用 urllib3 下载文件的最佳方式是什么

[英]What's the best way to download file using urllib3

I would like to download file over HTTP protocol using urllib3 .我想使用urllib3通过HTTP协议下载文件。 I have managed to do this using following code:我已经使用以下代码设法做到了这一点:

 url = 'http://url_to_a_file'
 connection_pool = urllib3.PoolManager()
 resp = connection_pool.request('GET',url )
 f = open(filename, 'wb')
 f.write(resp.data)
 f.close()
 resp.release_conn()

But I was wondering what is the proper way of doing this.但我想知道这样做的正确方法是什么。 For example will it work well for big files and If no what to do to make this code more bug tolerant and scalable.例如,它是否适用于大文件,如果没有如何使此代码更具容错性和可扩展性。

Note.笔记。 It is important to me to use urllib3 library not urllib2 for example, because I want my code to be thread safe.例如,使用urllib3库而不是urllib2对我来说很重要,因为我希望我的代码是线程安全的。

Your code snippet is close.您的代码片段很接近。 Two things worth noting:有两点值得注意:

  1. If you're using resp.data , it will consume the entire response and return the connection (you don't need to resp.release_conn() manually).如果您使用resp.data ,它将消耗整个响应并返回连接(您不需要手动resp.release_conn() )。 This is fine if you're cool with holding the data in-memory.如果您喜欢将数据保存在内存中,这很好。

  2. You could use resp.read(amt) which will stream the response, but the connection will need to be returned via resp.release_conn() .您可以使用resp.read(amt)来流式传输响应,但需要通过resp.release_conn()返回连接。

This would look something like...这看起来像......

import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)

with open(path, 'wb') as out:
    while True:
        data = r.read(chunk_size)
        if not data:
            break
        out.write(data)

r.release_conn()

The documentation might be a bit lacking on this scenario.在这种情况下,文档可能有点缺乏。 If anyone is interested in making a pull-request to improve the urllib3 documentation , that would be greatly appreciated.如果有人有兴趣提出请求以改进 urllib3 文档,我们将不胜感激。 :) :)

The most correct way to do this is probably to get a file-like object that represents the HTTP response and copy it to a real file using shutil.copyfileobj as below:最正确的方法可能是获取一个代表 HTTP 响应的类文件对象,并使用shutil.copyfileobj 将其复制到一个真实的文件中,如下所示:

url = 'http://url_to_a_file'
c = urllib3.PoolManager()

with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file:
    shutil.copyfileobj(resp, out_file)

resp.release_conn()     # not 100% sure this is required though

Most easy way with urllib3, you can use shutil do auto-manage packages.使用 urllib3 最简单​​的方法,您可以使用 shutil 自动管理软件包。

import urllib3
import shutil

http = urllib3.PoolManager()
with open(filename, 'wb') as out:
    r = http.request('GET', url, preload_content=False)
    shutil.copyfileobj(r, out)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM