繁体   English   中英

使用线程池下载多个 url

[英]Downloading multiple urls with threadpool

我在下载多个网址时遇到问题。 我的代码仍然每个 session 只下载 1 个 url。 在下载下一个之前仍然需要完成第一个。

我想同时下载 3 个网址。

这是我的代码:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0'
}

def download(path, video_url, bar: tqdm):
    
    res = requests.get(video_url, headers, stream=True)

    with open(path, 'wb') as f:
        for b in res.iter_content(1024):
            f.write(b)
            bar.update(len(b))

def get_length(video_url):
    res = requests.get(video_url, headers, stream=True)
    le = int(res.headers['Content-Length'])
    return le

def download_all(urls: list, thread: int = cpu_count()):

    total = len(urls)
    count = 0

    pool = ThreadPool(thread)  # https://stackoverflow.com/a/56528204/14951175

    for url in urls:
        output_file = get_url_path(url)
        count += 1
        content_length = get_length(video_url=url)
        with tqdm(total=content_length, unit='B', ncols=(150-1), desc=f'Downloading {count} of {total}', unit_divisor=1024, ascii=True, unit_scale=True) as bar:
            pool.apply_async(download(output_file, url, bar))
    pool.close()
    pool.join()


  urls = read_lines('urls.txt')
  download_all(urls)

这条线

pool.apply_async(download(output_file, url, bar))

一定是

pool.apply_async(download, (output_file, url, bar))

否则,您调用download方法而不是将其(和参数)传递给 ThreadPool。


编辑

使用星图到starmap的 url 来执行下载的func (顺便说一句:您可以保护重复的 get-request)。 并添加position参数。
老实说,条形图的运行不是很顺畅,但我对tqdmThreadPool并没有真正的经验。 但总的来说,下载似乎有效。

def download_all(urls: list, thread: int = cpu_count()):
    total = len(urls)

    pool = ThreadPool(thread)

    def func(count, url):
        output_file = get_url_path(url)
        req = requests.get(url, headers=headers, stream=True)
        content_length = int(req.headers['Content-Length'])
        with tqdm(total=content_length, unit='B', desc=f'Downloading {count + 1} of {total}',
                  unit_divisor=1024, ascii=True, unit_scale=True, position=count, file=sys.stdout) as bar:
            with open(output_file, 'wb') as f:
                for b in req.iter_content(1024):
                    f.write(b)
                    bar.update(len(b))

    pool.starmap(func, enumerate(urls))

    pool.close()
    pool.join()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM