简体   繁体   中英

Python multiprocessing.Pool and argument pickling

Consider the following example:

import multiprocessing as mp

def job(l):
    l.append(1)
    return l

if __name__ == "__main__":
    pool = mp.Pool(1)
    my_list = []
    out = pool.map(job, [my_list for i in range(5)])
    pool.close()
    pool.join()
    print(out)

When calling pool.map, I would expect that the arguments are pickled and then unpickled once the job is called (thus recreated every time). However, the observed output is

[[1, 1], [1, 1], [1, 1], [1, 1], [1]]

Could someone please explain what is going on? I expected the output to be a list of five [1], or [[1], [1, 1], ..., [1, 1, 1, 1, 1]], neither of which is the case.

The chunksize parameter for pool.map is the cause for your confusion. Apparently it will choose to auto-set chunksize=2 for your setup, because you get the output you observed also with explicitly setting chunksize=2 .

With chunksize=1 you would get [[1], [1], [1], [1], [1]] and with chunksize=3 you would get [[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1], [1, 1]] .

If you expand your code with prints, you can watch what happens:

import multiprocessing as mp

def job(l):
    print(f'before append {l}')
    l.append(1)
    print(f'after append {l}')
    return l

if __name__ == "__main__":
    pool = mp.Pool(1)
    my_list = []
    out = pool.map(job, [my_list for _ in range(5)], chunksize=2)
    pool.close()
    pool.join()
    print(out)

This will give you this output:

before append []
after append [1]
before append [1]
after append [1, 1]
before append []
after append [1]
before append [1]
after append [1, 1]
before append []
after append [1]
[[1, 1], [1, 1], [1, 1], [1, 1], [1]]

Process finished with exit code 0

You can see, that "before append" starts only three times with the empty list, not five times how you would expect. That's because with chunksize=2 and five items in the iterable you have 5 / 2 = 2.5 tasks. Half a task is not possible, so that's why you end up with 3 tasks: 2 tasks with two-item-chunks and one task with a one-item chunk.

Now for the first two tasks, the first execution of your function job gets the unpickled empty list and appends 1 . Then the second execution gets the same list the first execution just modified, because your items are just references to the same list within this task. The second execution also changes the result of the first execution, because both modify the same underlying object. After the second execution the task is complete and the result of the two executions [[1, 1], [1, 1]] gets sent back to the parent. This happens, as we said, for the first two tasks.

The third task only has one execution of job and it's result doesn't get modified by a second one so the result is only [1].

If you add for obj in out: print(id(obj)) at the end of your code you will see, that you get three different id's for three separate lists in the result, as many as tasks have been built to process your iterable (CPython).:

140584841382600
140584841382600
140584841383432
140584841383432
140584841383368

This yields different results with different numbers of processes, which means you're doing something that's not process-safe; in this case, operating on a native list in (potentially) multiple processes.

I'm not exactly clear on what you're trying to achieve, but this at least behaves consistently:

from multiprocessing import Pool, Manager


def job(l):
    l.append(1)
    return l


if __name__ == "__main__":
    manager = Manager()

    for proc_count in range(1, 6):
        print(proc_count)
        pool = Pool(proc_count)
        my_list = manager.list()
        out = pool.map(job, [my_list for i in range(5)])
        pool.close()
        pool.join()
        print(list(list(o) for o in out))

If that's not what you're going for, forgetting the manager, dropping my_list and using [list() for i in range(5)] also results in consistent, though different, behavior.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM