简体   繁体   English

如何限制 python3 请求中的下载速度?

[英]How to limit download speed in python3 requests?

I am using requests to download a large (~50MiB) file on a small embedded device running Linux.我正在使用请求在运行 Linux 的小型嵌入式设备上下载一个大(~50MiB)文件。

File is to be written to attached MMC.文件将写入附加的 MMC。

Unfortunately MMC write speed is lower then net speed and I see memory consumption raise and, in a few cases I even had kernel "unable to handle page..." error.不幸的是,MMC 写入速度低于净速度,我看到 memory 消耗增加,在某些情况下,我什至遇到 kernel “无法处理页面......”错误。

Device has only 128MiB RAM.设备只有 128MiB RAM。

The code I'm using is:我正在使用的代码是:

            with requests.get(URL,  stream=True) as r:
                if r.status_code != 200:
                    log.error(f'??? download returned {r.status_code}')
                    return -(offset + r.status_code)
                siz = 0
                with open(sfn, 'wb') as fo:
                    for chunk in r.iter_content(chunk_size=4096):
                        fo.write(chunk)
                        siz += len(chunk)
                return siz

How can I temporarily stop server while I writer to MMC?写入 MMC 时如何暂时停止服务器?

                if r.status_code != 200:
                    log.error(f'??? download returned {r.status_code}')
                    return -(offset + r.status_code)
                siz = 0
                with open(sfn, 'wb') as fo:
                    for chunk in r.iter_content(chunk_size=4096):
                        fo.write(chunk)
                        siz += len(chunk)
                return siz

You can rewrite it as a coroutine您可以将其重写为协程

import requests

def producer(URL,temp_data,n):
    with requests.get(URL,  stream=True) as r:
        if r.status_code != 200:
            log.error(f'??? download returned {r.status_code}')
            return -(offset + r.status_code)
        for chunk in r.iter_content(chunk_size=n):
            temp_data.append(chunk)
            yield #waiting to finish the consumer
            

def consumer(temp_data,fname):
    with open(fname, 'wb') as fo:
        while True:
            while len(temp_data) > 0:
                for data in temp_data:
                    fo.write(data)
                    temp_data.remove(data) # To remove it from the list
                    # You can add sleep here
                    yield #waiting for more data


def coordinator(URL,fname,n=4096):
    temp_data = list()
    c = consumer(temp_data,fname)
    p = producer(URL,temp_data,n)
    while True:
        try:
            #getting data
            next(p)
        except StopIteration:
            break
        finally:
            #writing data
            next(c)

These are all the functions you need.这些都是您需要的功能。 To call this调用这个

URL = "URL"
fname = 'filename'
coordinator(URL,fname)

If the web server supports the http Range field , you can request a download of only part of the large file and then step through the entire file part by part.如果 web 服务器支持http Range字段,您可以请求仅下载大文件的一部分,然后逐步浏览整个文件。

Take a look at this question , where James Mills gives the following example code:看看这个问题,James Mills 给出了以下示例代码:

from requests import get

url = "http://download.thinkbroadband.com/5MB.zip"
headers = {"Range": "bytes=0-100"}  # first 100 bytes

r = get(url, headers=headers)

As your problem is memory, you will want to stop the server from sending you the whole file at once, as this will certainly be buffered by some code on your device.由于您的问题是 memory,因此您需要阻止服务器一次向您发送整个文件,因为这肯定会被您设备上的某些代码缓冲。 Unless you can make requests drop part of the data it receives, this will always be a problem.除非您可以让请求删除它收到的部分数据,否则这将始终是个问题。 Additional buffers downstream of requests will not help. requests下游的额外缓冲区将无济于事。

You can try decreasing the size of the TCP receive buffer with this bash command:您可以尝试使用此 bash 命令减小 TCP 接收缓冲区的大小:

echo 'net.core.rmem_max=1000000' >> /etc/sysctl.conf

(1 MB, you can tune this) (1 MB,你可以调整这个)

This stops there being a huge buffer build-up at this stage of the process.这将阻止在该过程的这个阶段建立巨大的缓冲区。

Then write code to only read from the TCP stack and write to the MMC at specified intervals to prevent buffers from building up elsewhere in the system, such as the MMC write buffer -- for example @e3n's answer.然后编写代码仅从 TCP 堆栈读取并以指定的时间间隔写入 MMC 以防止缓冲区在系统的其他地方建立,例如 MMC 写缓冲区 - 例如@e3n 的答案。

Hopefully this should cause packets to be dropped and then re-sent by the server once the buffer opens up again.希望这会导致数据包被丢弃,然后在缓冲区再次打开时由服务器重新发送。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM