列表长于进程数时的 Python multiprocessing.Pool.map 行为

Question

When submitting a list of tasks that is longer than the number of processes, how are the processes assigned to these tasks?当提交比进程数长的任务列表时，进程如何分配给这些任务？

from multiprocessing import Pool

def f(i):
    print(i)
    return i

with Pool(2) as pool:
    print(pool.map(f, [1, 2, 3, 4, 5]))

I'm running a more complex function and the execution doesn't seem to be in order (FIFO).我正在运行一个更复杂的函数，并且执行似乎没有顺序（FIFO）。

Answer 1

Here's some sample code:这是一些示例代码：

from multiprocessing import Pool
from time import sleep


def f(x):
    print(x)
    sleep(0.1)
    return x * x


if __name__ == '__main__':
    with Pool(2) as pool:
        print(pool.map(f, range(100)))

Which prints out:打印出来：

If we look into the relevant source code in multiprocessing :如果我们查看multiprocessing的相关源代码：

    def _map_async(self, func, iterable, mapper, chunksize=None, callback=None,
            error_callback=None):
        '''
        Helper function to implement map, starmap and their async counterparts.
        '''
        self._check_running()
        if not hasattr(iterable, '__len__'):
            iterable = list(iterable)

        if chunksize is None:
            chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
            if extra:
                chunksize += 1
        if len(iterable) == 0:
            chunksize = 0

        task_batches = Pool._get_tasks(func, iterable, chunksize)

Here we have len(iterable) == 100 , len(self._pool) * 4 == 8 , so chunksize, extra = 12, 4 which leads to chunksize = 13 , hence the output shows the tasks being split into batches of 13.这里我们有len(iterable) == 100 , len(self._pool) * 4 == 8 ，所以chunksize, extra = 12, 4导致chunksize = 13 ，因此输出显示任务被分成 13 个批次.

Answer 2

The Pool class represents a pool of worker processes. Pool类代表一个工作进程池。 It spawns new process as soon as an one on the existing process finishes.一旦现有进程上的一个进程完成，它就会产生新进程。 To understand better, we set a chunksize=1 , consider the code,为了更好地理解，我们设置了一个chunksize=1 ，考虑一下代码，

from multiprocessing import Pool
from time import sleep


def f(x):
    print(f"Task {x} enter")
    sleep(5)
    print(f"Task {x} exit")
    return x * x


if __name__ == '__main__':
    with Pool(2) as pool:
        print(pool.map(f, range(10), chunksize=1))

So the order of execution will be,所以执行的顺序是，

Task 0 enter
Task 1 enter
Task 0 exit
Task 2 enter
Task 1 exit
Task 3 enter
Task 2 exit
Task 4 enter
Task 3 exit
Task 5 enter
Task 4 exit
Task 6 enter
Task 5 exit
Task 7 enter
Task 6 exit
Task 8 enter
Task 7 exit
Task 9 enter
Task 8 exit
Task 9 exit
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

列表长于进程数时的 Python multiprocessing.Pool.map 行为

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-03-08 17:39:15

解决方案2
1 2020-03-08 18:11:32

列表长于进程数时的 Python multiprocessing.Pool.map 行为

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-03-08 17:39:15

解决方案2 1 2020-03-08 18:11:32

解决方案1
1 已采纳 2020-03-08 17:39:15

解决方案2
1 2020-03-08 18:11:32