简体   繁体   English

Python请求模块的下载速度异常低

[英]Ridiculously low download speed with Python requests module

Problem: 问题:

I have been trying to make a simple anime downloader using Python's request module. 我一直在尝试使用Python的request模块制作一个简单的动漫下载器。 I am tracking the progress using the progressbar2 module. 我正在使用progressbar2模块跟踪进度。 While trying to download, I'm getting speed of 0.x B/s. 尝试下载时,速度为0.x B / s。 I assumed the problem is about choosing the chunk_size based on this question . 我以为问题在于根据此问题选择chunk_size But I am getting the same negligible speeds irrespective of chunk size. 但是无论块大小如何,我得到的速度都是可以忽略的。

Specs and info: 规格和信息:

  1. I am using Windows 10, Python 3.5, latest requests module (2.18.4) and have a decent internet with speed of 40Mbps. 我正在使用Windows 10,Python 3.5,最新requests模块(2.18.4),并具有速度为40Mbps的不错的互联网。
  2. I can download the file from the link through browser(Chrome) and Free Download Manager in about 1 minute. 我可以在大约1分钟内通过浏览器(Chrome)和免费下载管理器从链接下载文件。
  3. The link is perfectly working and I have no firewall conflicts. 链接工作正常,我没有防火墙冲突。

Code: 码:

import os
import requests
import progressbar
from progressbar import *

os.chdir('D:\\anime\\ongoing')

widgets = ['Downloading: ', Percentage(), ' ', Bar(marker='#',left='[',right=']'),
           ' ', ETA(), FileTransferSpeed()]

url = 'https://lh3.googleusercontent.com/AtkUe87GbrINzTJS_Fj4W08CGqlOg9anwEF7n5-eKXcyS1RsaB8LdzRVaXloiJwiaX2IX1xqUiA=m22?title=(720P%20-%20mp4)Net-juu%20no%20Susume%20Episode%207'
r = requests.get(url,stream=True)
remotesize = r.headers['content-length']

print("Downloading {}.mp4!\n\n".format(url.split('title=')[1]))
pbar = ProgressBar(max_value=int(remotesize),widgets=widgets).start()
i = 0
with open('./tempy/tempy_file.mp4', 'wb') as f:
   for chunk in r.iter_content(chunk_size=5*1024*1024): 
      if chunk:
         i = i + len(chunk)
         f.write(chunk)
         pbar.update(int(i/int(remotesize) * 100))
pbar.finish()         
print("Successfully downloaded!\n\n")

Screenshot: 截图:

速度简直太荒谬了。

Expected Solution: 预期解决方案:

Not sure if this Github Issue was fixed. 不知道此Github问题是否已解决。

  1. It would be preferable to find a solution within requests module but I am open to any answers within the scope of Python that can get me a good speed. 在请求模块中找到解决方案将是更可取的,但是我愿意接受Python范围内的所有答案,这些答案可以使我获得很好的速度。
  2. I want the download to be chunk-wise because I want to see the progress via the progressbar. 我希望下载是逐块的,因为我想通过进度条查看进度。 So shutil.copyfileobj(r.raw) isn't what I'm looking for. 所以shutil.copyfileobj(r.raw)不是我想要的。
  3. I did try using multiple threads but it only complicated things and didn't help. 我确实尝试过使用多个线程,但是它只会使事情变得复杂而无济于事。 I think the problem is with writing the chunk to the buffer itself and splitting this task between threads doesn't help. 我认为问题在于将块写入缓冲区本身,并且在线程之间拆分此任务没有帮助。

Edit: 编辑:

As per suggestion, I tried it by including random user agents as shown: 根据建议,我尝试通过包括如下所示的随机用户代理进行尝试:

desktop_agents = ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
                 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
                 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0']

def random_headers():
    return {'User-Agent': choice(desktop_agents),'Accept':'text/html,video/mp4,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}

and sending the request with header as r = requests.get(url,stream=True,headers=random_headers()) 并发送标头为r = requests.get(url,stream=True,headers=random_headers())

However, it made no difference. 但是,这没有什么区别。 :( :(

Edit no. 编辑编号 2: 2:

Tried it with a sample video from " http://www.sample-videos.com/video/mp4/720/big_buck_bunny_720p_5mb.mp4 ". 使用来自“ http://www.sample-videos.com/video/mp4/720/big_buck_bunny_720p_5mb.mp4 ”的示例视频对其进行了尝试。 Same problem persists. 同样的问题仍然存在。 :/ :/

So like the others suggested, google was throttling the speed. 因此,就像其他人建议的那样,谷歌正在节制速度。 In order to overcome this, I used Selenium webdriver to download the links: 为了克服这个问题,我使用了Selenium Webdriver下载链接:

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : dir_name}
            chrome_options.add_experimental_option('prefs', prefs)
            driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(li)

Well, at least I'm able to completely automate the download at the speed possible by google chrome's downloader. 好吧,至少我能够以Google chrome的下载程序可能的速度完全自动化下载。

So if anyone can help me figure this one out, please reply in the comments and I'll upvote them if helpful: 因此,如果有人可以帮助我解决这个问题,请在评论中回复,如果有帮助,我会投票赞成:

  1. Figure out a way in Python to use multiple connections for each file like the way Free Download Manager uses. 找出Python中为每个文件使用多个连接的方法,例如Free Download Manager使用的方法。

Here's the link to the complete script . 这是完整脚本的链接。

您是否尝试用用户代理和Google可能需要的其他标头填充请求标头,以免将您标记为Bot并限制您的下载速度?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM