简体   繁体   English

python 多处理池

[英]python multiprocessing pool

While using python multiprocessing pool how many jobs are submitted?使用 python 多处理池时提交了多少作业?

How is it decided?它是如何决定的? Can we control it somehow?我们能以某种方式控制它吗? Like at most 10 jobs in the queue at most to reduce memory usage.最多喜欢队列中最多 10 个作业,以减少 memory 的使用。

Assume that I have the backbone code written below: For each chrom and simulation I read the data as pandas dataframe.假设我有如下编写的主干代码:对于每个 chrom 和模拟,我将数据读取为 pandas dataframe。

(I thought that reading data before submiting the job would be better, to reduce I/O bound in the worker process) (我认为在提交作业之前读取数据会更好,以减少工作进程中的 I/O 绑定)

Then I send the pandas dataframe to each worker to process it.然后我将 pandas dataframe 发送给每个工人进行处理。

But it seems that lots of jobs are submitted than the number of jobs finalized and this is resulting in memory error.但似乎提交的作业数量多于最终确定的作业数量,这导致 memory 错误。

numofProcesses = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=numofProcesses)
jobs=[]


all_result1={}
all_result2={}

def accumulate(result):
 result1=result[0]
 result2=result[1]
 accumulate(resulst1,all_result1)
 accumulate(resulst2,all_result2)
 print('ACCUMULATE')

for each chr:
 for each sim:
     chrBased_simBased_df= readData(chr,sim)
     jobs.append(pool.apply_async(func, args=(chrBased_simBased_df,too,many,),callback=accumulate))
     print('Submitted job:%d' %(len(jobs)))

pool.close()
pool.join()

Is there a way to get rid of it?有没有办法摆脱它?

Both multiprocessing.Pool and concurrent.futures.ProcessPoolExecutor do not allow to limit the amount of tasks you submit to the workers. multiprocessing.Poolconcurrent.futures.ProcessPoolExecutor都不允许限制您提交给工作人员的任务数量。

Nevertheless, this a very trivial extension you can build yourself by using a Semaphore.不过,这是一个非常简单的扩展,您可以使用 Semaphore 自行构建。

You can check an example in this gist .您可以查看此gist中的示例。 It uses the concurrent.futures module but it should be trivial to port it to multiprocessing.Pool as well.它使用concurrent.futures模块,但将其移植到multiprocessing.Pool也应该很简单。

from threading import BoundedSemaphore
from concurrent.futures import ProcessPoolExecutor


class MaxQueuePool:
    """This Class wraps a concurrent.futures.Executor
    limiting the size of its task queue.
    If `max_queue_size` tasks are submitted, the next call to submit will block
    until a previously submitted one is completed.
    """
    def __init__(self, executor, max_queue_size, max_workers=None):
        self.pool = executor(max_workers=max_workers)
        self.pool_queue = BoundedSemaphore(max_queue_size)

    def submit(self, function, *args, **kwargs):
        """Submits a new task to the pool, blocks if Pool queue is full."""
        self.pool_queue.acquire()

        future = self.pool.submit(function, *args, **kwargs)
        future.add_done_callback(self.pool_queue_callback)

        return future

    def pool_queue_callback(self, _):
        """Called once task is done, releases one queue slot."""
        self.pool_queue.release()


if __name__ == '__main__':
    pool = MaxQueuePool(ProcessPoolExecutor, 8)
    f = pool.submit(print, "Hello World!")
    f.result()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM