简体   繁体   English

列表长于进程数时的 Python multiprocessing.Pool.map 行为

[英]Python multiprocessing.Pool.map behavior when list is longer than number of processes

When submitting a list of tasks that is longer than the number of processes, how are the processes assigned to these tasks?当提交比进程数长的任务列表时,进程如何分配给这些任务?

from multiprocessing import Pool

def f(i):
    print(i)
    return i

with Pool(2) as pool:
    print(pool.map(f, [1, 2, 3, 4, 5]))

I'm running a more complex function and the execution doesn't seem to be in order (FIFO).我正在运行一个更复杂的函数,并且执行似乎没有顺序(FIFO)。

Here's some sample code:这是一些示例代码:

from multiprocessing import Pool
from time import sleep


def f(x):
    print(x)
    sleep(0.1)
    return x * x


if __name__ == '__main__':
    with Pool(2) as pool:
        print(pool.map(f, range(100)))

Which prints out:打印出来:

0
13
1
14
2
15
3
16
4
...

If we look into the relevant source code in multiprocessing :如果我们查看multiprocessing的相关源代码:

    def _map_async(self, func, iterable, mapper, chunksize=None, callback=None,
            error_callback=None):
        '''
        Helper function to implement map, starmap and their async counterparts.
        '''
        self._check_running()
        if not hasattr(iterable, '__len__'):
            iterable = list(iterable)

        if chunksize is None:
            chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
            if extra:
                chunksize += 1
        if len(iterable) == 0:
            chunksize = 0

        task_batches = Pool._get_tasks(func, iterable, chunksize)

Here we have len(iterable) == 100 , len(self._pool) * 4 == 8 , so chunksize, extra = 12, 4 which leads to chunksize = 13 , hence the output shows the tasks being split into batches of 13.这里我们有len(iterable) == 100 , len(self._pool) * 4 == 8 ,所以chunksize, extra = 12, 4导致chunksize = 13 ,因此输出显示任务被分成 13 个批次.

The Pool class represents a pool of worker processes. Pool类代表一个工作进程池。 It spawns new process as soon as an one on the existing process finishes.一旦现有进程上的一个进程完成,它就会产生新进程。 To understand better, we set a chunksize=1 , consider the code,为了更好地理解,我们设置了一个chunksize=1 ,考虑一下代码,

from multiprocessing import Pool
from time import sleep


def f(x):
    print(f"Task {x} enter")
    sleep(5)
    print(f"Task {x} exit")
    return x * x


if __name__ == '__main__':
    with Pool(2) as pool:
        print(pool.map(f, range(10), chunksize=1))

So the order of execution will be,所以执行的顺序是,

Task 0 enter
Task 1 enter
Task 0 exit
Task 2 enter
Task 1 exit
Task 3 enter
Task 2 exit
Task 4 enter
Task 3 exit
Task 5 enter
Task 4 exit
Task 6 enter
Task 5 exit
Task 7 enter
Task 6 exit
Task 8 enter
Task 7 exit
Task 9 enter
Task 8 exit
Task 9 exit
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python multiprocessing.pool.map,将参数传递给衍生进程 - python multiprocessing.pool.map, passing arguments to spawned processes Python在使用multiprocessing.pool.map()调用的函数中递增一个数字 - Python Increment a number in a function called with multiprocessing.pool.map() Python 2 + multiprocessing.Pool.map有无限的可迭代性吗? - Python 2 + multiprocessing.Pool.map for endless iterable? Python multiprocessing.Pool.map以静默方式死亡 - Python multiprocessing.Pool.map dying silently 检索使用multiprocessing.Pool.map启动的进程的退出代码 - Retrieve exit code of processes launched with multiprocessing.Pool.map 为什么multiprocessing.Pool.map比内置map慢? - Why is multiprocessing.Pool.map slower than builtin map? 使用Multiprocessing.pool.map运行的代码比没有运行慢 - Code runs slower with Multiprocessing.pool.map than without 带有全局数据的python并行映射(multiprocessing.Pool.map) - python parallel map (multiprocessing.Pool.map) with global data 使用Python的multiprocessing.pool.map来操作相同的整数 - Using Python's multiprocessing.pool.map to manipulate the same integer Python 的 multiprocessing.Pool.map 中的“chunksize”参数 - “chunksize” parameter in Python's multiprocessing.Pool.map
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM