简体   繁体   English

带有管理器和异步方法的 multiprocessing.pool

[英]multiprocessing.pool with manager and async methods

I am trying to make use of Manager() to share dictionary between processes and tried out the following code:我正在尝试使用 Manager() 在进程之间共享字典并尝试了以下代码:

from multiprocessing import Manager, Pool

def f(d):
    d['x'] += 2

if __name__ == '__main__':
    manager = Manager()
    d = manager.dict()
    d['x'] = 2
    p= Pool(4)

    for _ in range(2000):
        p.map_async(f, (d,))  #apply_async, map

    p.close()
    p.join()

    print (d)  # expects this result --> {'x': 4002}

Using map_async and apply_async, the result printed is always different (eg {'x': 3838}, {'x': 3770}).使用 map_async 和 apply_async,打印的结果总是不同的(例如 {'x': 3838}, {'x': 3770})。 However, using map will give the expected result.但是,使用 map 会得到预期的结果。 Also, i have tried using Process instead of Pool, the results are different too.另外,我尝试使用 Process 而不是 Pool,结果也不同。

Any insights?有什么见解吗? Something on the non-blocking part and race conditions are not handled by manager?非阻塞部分和竞态条件不是由经理处理的吗?

When you call map (rather than map_async ), it will block until the processors have finished all the requests you are passing, which in your case is just one call to function f .当您调用map (而不是map_async )时,它将阻塞,直到处理器完成您传递的所有请求,在您的情况下,这只是对 function f的一次调用。 So even though you have a pool size of 4, you are in essence doing the 2000 processes one at a time.因此,即使您的池大小为 4,您实际上也是一次处理 2000 个进程。 To actually parallelize execution, you should have done a single p.map(f, [d]*2000) instead of the loop.要实际并行执行,您应该执行单个p.map(f, [d]*2000)而不是循环。

But when you call map_async , you do not block and are returned a result object.但是当您调用map_async时,您不会阻塞并返回结果 object。 A call to get on the result object will block until the process finishes and will return with the result of the function call. get结果 object 的调用阻塞,直到进程完成,并将返回 function 调用的结果。 So now you are running up to 4 processes at a time.因此,现在您一次最多运行 4 个进程。 But the update to the dictionary is not serialized across the processors.但是字典的更新不会跨处理器序列化。 I have modifed the code to force serialization of of d[x] += 2 by using a multiprocessing lock.我修改了代码以通过使用多处理锁来强制对d[x] += 2进行序列化。 You will see that the results are now 4002.您将看到结果现在是 4002。

from multiprocessing import Manager, Pool, Lock


def f(d):
    lock.acquire()
    d['x'] += 2
    lock.release()

def init(l):
    global lock
    lock = l

if __name__ == '__main__':
    with Manager() as manager:
        d = manager.dict()
        d['x'] = 2
        lock = Lock()
        p = Pool(4, initializer=init, initargs=(lock,)) # Create the multiprocessing lock that is sharable by all the processes

        results = [] # if the function returnd a result we wanted
        for _ in range(2000):
            results.append(p.map_async(f, (d,)))  #apply_async, map
        """
        for i in range(2000): # if the function returned a result we wanted
            results[i].get() # wait for everything to finish
        """
        p.close()
        p.join()
        print(d)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM