简体   繁体   English

多处理:使用 tqdm 显示进度条

[英]Multiprocessing : use tqdm to display a progress bar

To make my code more "pythonic" and faster, I use "multiprocessing" and a map function to send it a) the function and b) the range of iterations.为了使我的代码更“pythonic”和更快,我使用“multiprocessing”和 map function 发送它 a)function 和 b)迭代范围

The implanted solution (ie, call tqdm directly on the range tqdm.tqdm(range(0, 30)) does not work with multiprocessing (as formulated in the code below).植入的解决方案(即直接在范围 tqdm.tqdm(range(0, 30)) 上调用 tqdm)不适用于多处理(如下面的代码所示)。

The progress bar is displayed from 0 to 100% (when python reads the code?) but it does not indicate the actual progress of the map function.进度条显示从0到100%(python读码时?)但并不表示map function的实际进度。

How to display a progress bar that indicates at which step the 'map' function is?如何显示一个进度条,指示“地图”function 在哪一步?

from multiprocessing import Pool
import tqdm
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   p = Pool(2)
   r = p.map(_foo, tqdm.tqdm(range(0, 30)))
   p.close()
   p.join()

Any help or suggestions are welcome...欢迎任何帮助或建议...

Use imap instead of map, which returns an iterator of processed values.使用 imap 而不是 map,它返回一个处理值的迭代器。

from multiprocessing import Pool
import tqdm
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   with Pool(2) as p:
      r = list(tqdm.tqdm(p.imap(_foo, range(30)), total=30))

Sorry for being late but if all you need is a concurrent map, I added this functionality in tqdm>=4.42.0 :抱歉迟到了,但如果您只需要并发映射,我在tqdm>=4.42.0添加了此功能:

from tqdm.contrib.concurrent import process_map  # or thread_map
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   r = process_map(_foo, range(0, 30), max_workers=2)

References: https://tqdm.github.io/docs/contrib.concurrent/ and https://github.com/tqdm/tqdm/blob/master/examples/parallel_bars.py参考资料: https : //tqdm.github.io/docs/contrib.concurrent/https://github.com/tqdm/tqdm/blob/master/examples/parallel_bars.py

It supports max_workers and chunksize and you can also easily switch from process_map to thread_map .它支持max_workerschunksize ,您还可以轻松地从process_map切换到thread_map

Solution Found : Be careful!找到的解决方案:小心! Due to multiprocessing, estimation time (iteration per loop, total time, etc.) could be unstable, but the progress bar works perfectly.由于多处理,估计时间(每个循环的迭代次数、总时间等)可能不稳定,但进度条工作正常。

Note: Context manager for Pool is only available from Python version 3.3注意:池的上下文管理器仅适用于 Python 3.3 版

from multiprocessing import Pool
import time
from tqdm import *

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
    with Pool(processes=2) as p:
        max_ = 30
        with tqdm(total=max_) as pbar:
            for i, _ in enumerate(p.imap_unordered(_foo, range(0, max_))):
                pbar.update()

You can use p_tqdm instead.您可以改用p_tqdm

https://github.com/swansonk14/p_tqdm https://github.com/swansonk14/p_tqdm

from p_tqdm import p_map
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   r = p_map(_foo, list(range(0, 30)))

based on the answer of Xavi Martínez I wrote the function imap_unordered_bar .根据哈维·马丁内斯的回答,我编写了函数imap_unordered_bar It can be used in the same way as imap_unordered with the only difference that a processing bar is shown.它可以以与imap_unordered相同的方式使用,唯一的区别是显示处理条。

from multiprocessing import Pool
import time
from tqdm import *

def imap_unordered_bar(func, args, n_processes = 2):
    p = Pool(n_processes)
    res_list = []
    with tqdm(total = len(args)) as pbar:
        for i, res in tqdm(enumerate(p.imap_unordered(func, args))):
            pbar.update()
            res_list.append(res)
    pbar.close()
    p.close()
    p.join()
    return res_list

def _foo(my_number):
    square = my_number * my_number
    time.sleep(1)
    return square 

if __name__ == '__main__':
    result = imap_unordered_bar(_foo, range(5))
import multiprocessing as mp
import tqdm


some_iterable = ...

def some_func():
    # your logic
    ...


if __name__ == '__main__':
    with mp.Pool(mp.cpu_count()-2) as p:
        list(tqdm.tqdm(p.imap(some_func, iterable), total=len(iterable)))

Here is my take for when you need to get results back from your parallel executing functions.当您需要从并行执行函数中获取结果时,这是我的看法。 This function does a few things (there is another post of mine that explains it further) but the key point is that there is a tasks pending queue and a tasks completed queue.这个函数做了一些事情(我的另一篇文章进一步解释了它),但关键是有一个待处理的任务队列和一个完成的任务队列。 As workers are done with each task in the pending queue they add the results in the tasks completed queue.当工作人员完成挂起队列中的每个任务时,他们将结果添加到任务完成队列中。 You can wrap the check to the tasks completed queue with the tqdm progress bar.您可以使用 tqdm 进度条将检查包装到任务完成队列中。 I am not putting the implementation of the do_work() function here, it is not relevant, as the message here is to monitor the tasks completed queue and update the progress bar every time a result is in.我没有把 do_work() 函数的实现放在这里,它不相关,因为这里的消息是监视任务完成队列并在每次有结果时更新进度条。

def par_proc(job_list, num_cpus=None, verbose=False):

# Get the number of cores
if not num_cpus:
    num_cpus = psutil.cpu_count(logical=False)

print('* Parallel processing')
print('* Running on {} cores'.format(num_cpus))

# Set-up the queues for sending and receiving data to/from the workers
tasks_pending = mp.Queue()
tasks_completed = mp.Queue()

# Gather processes and results here
processes = []
results = []

# Count tasks
num_tasks = 0

# Add the tasks to the queue
for job in job_list:
    for task in job['tasks']:
        expanded_job = {}
        num_tasks = num_tasks + 1
        expanded_job.update({'func': pickle.dumps(job['func'])})
        expanded_job.update({'task': task})
        tasks_pending.put(expanded_job)

# Set the number of workers here
num_workers = min(num_cpus, num_tasks)

# We need as many sentinels as there are worker processes so that ALL processes exit when there is no more
# work left to be done.
for c in range(num_workers):
    tasks_pending.put(SENTINEL)

print('* Number of tasks: {}'.format(num_tasks))

# Set-up and start the workers
for c in range(num_workers):
    p = mp.Process(target=do_work, args=(tasks_pending, tasks_completed, verbose))
    p.name = 'worker' + str(c)
    processes.append(p)
    p.start()

# Gather the results
completed_tasks_counter = 0

with tqdm(total=num_tasks) as bar:
    while completed_tasks_counter < num_tasks:
        results.append(tasks_completed.get())
        completed_tasks_counter = completed_tasks_counter + 1
        bar.update(completed_tasks_counter)

for p in processes:
    p.join()

return results

For progress bar with apply_async, we can use following code as suggested in:对于带有 apply_async 的进度条,我们可以按照以下建议使用以下代码:

https://github.com/tqdm/tqdm/issues/484 https://github.com/tqdm/tqdm/issues/484

import time
import random
from multiprocessing import Pool
from tqdm import tqdm

def myfunc(a):
    time.sleep(random.random())
    return a ** 2

pool = Pool(2)
pbar = tqdm(total=100)

def update(*a):
    pbar.update()

for i in range(pbar.total):
    pool.apply_async(myfunc, args=(i,), callback=update)
pool.close()
pool.join()

Based on "user17242583" answer, I created the following function.基于“user17242583”的回答,我创建了以下 function。 It should be as fast as Pool.map and the results are always ordered.它应该和 Pool.map 一样快,并且结果总是有序的。 Plus, you can pass as many parameters to your function as you want and not just a single iterable.另外,您可以根据需要向 function 传递任意数量的参数,而不仅仅是一个可迭代的参数。

from multiprocessing import Pool
from functools import partial
from tqdm import tqdm


def imap_tqdm(function, iterable, processes, chunksize=1, desc=None, disable=False, **kwargs):
    """
    Run a function in parallel with a tqdm progress bar and an arbitrary number of arguments.
    Results are always ordered and the performance should be the same as of Pool.map.
    :param function: The function that should be parallelized.
    :param iterable: The iterable passed to the function.
    :param processes: The number of processes used for the parallelization.
    :param chunksize: The iterable is based on the chunk size chopped into chunks and submitted to the process pool as separate tasks.
    :param desc: The description displayed by tqdm in the progress bar.
    :param disable: Disables the tqdm progress bar.
    :param kwargs: Any additional arguments that should be passed to the function.
    """
    if kwargs:
        function_wrapper = partial(_wrapper, function=function, **kwargs)
    else:
        function_wrapper = partial(_wrapper, function=function)

    results = [None] * len(iterable)
    with Pool(processes=processes) as p:
        with tqdm(desc=desc, total=len(iterable), disable=disable) as pbar:
            for i, result in p.imap_unordered(function_wrapper, enumerate(iterable), chunksize=chunksize):
                results[i] = result
                pbar.update()
    return results


def _wrapper(enum_iterable, function, **kwargs):
    i = enum_iterable[0]
    result = function(enum_iterable[1], **kwargs)
    return i, result

This approach simple and it works.这种方法简单而且有效。

from multiprocessing.pool import ThreadPool
import time
from tqdm import tqdm

def job():
    time.sleep(1)
    pbar.update()

pool = ThreadPool(5)
with tqdm(total=100) as pbar:
    for i in range(100):
        pool.apply_async(job)
    pool.close()
    pool.join()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM