[英]Downloading multiple urls with threadpool
我在下载多个网址时遇到问题。 我的代码仍然每个 session 只下载 1 个 url。 在下载下一个之前仍然需要完成第一个。
我想同时下载 3 个网址。
这是我的代码:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0'
}
def download(path, video_url, bar: tqdm):
res = requests.get(video_url, headers, stream=True)
with open(path, 'wb') as f:
for b in res.iter_content(1024):
f.write(b)
bar.update(len(b))
def get_length(video_url):
res = requests.get(video_url, headers, stream=True)
le = int(res.headers['Content-Length'])
return le
def download_all(urls: list, thread: int = cpu_count()):
total = len(urls)
count = 0
pool = ThreadPool(thread) # https://stackoverflow.com/a/56528204/14951175
for url in urls:
output_file = get_url_path(url)
count += 1
content_length = get_length(video_url=url)
with tqdm(total=content_length, unit='B', ncols=(150-1), desc=f'Downloading {count} of {total}', unit_divisor=1024, ascii=True, unit_scale=True) as bar:
pool.apply_async(download(output_file, url, bar))
pool.close()
pool.join()
urls = read_lines('urls.txt')
download_all(urls)
这条线
pool.apply_async(download(output_file, url, bar))
一定是
pool.apply_async(download, (output_file, url, bar))
否则,您调用download
方法而不是将其(和参数)传递给 ThreadPool。
编辑
使用星图到starmap
的 url 来执行下载的func
(顺便说一句:您可以保护重复的 get-request)。 并添加position
参数。
老实说,条形图的运行不是很顺畅,但我对tqdm
或ThreadPool
并没有真正的经验。 但总的来说,下载似乎有效。
def download_all(urls: list, thread: int = cpu_count()):
total = len(urls)
pool = ThreadPool(thread)
def func(count, url):
output_file = get_url_path(url)
req = requests.get(url, headers=headers, stream=True)
content_length = int(req.headers['Content-Length'])
with tqdm(total=content_length, unit='B', desc=f'Downloading {count + 1} of {total}',
unit_divisor=1024, ascii=True, unit_scale=True, position=count, file=sys.stdout) as bar:
with open(output_file, 'wb') as f:
for b in req.iter_content(1024):
f.write(b)
bar.update(len(b))
pool.starmap(func, enumerate(urls))
pool.close()
pool.join()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.