简体   繁体   English

与 multiprocessing.Pool 共享一个计数器

[英]Sharing a counter with multiprocessing.Pool

I'd like to use multiprocessing.Value + multiprocessing.Lock to share a counter between separate processes.我想使用multiprocessing.Value + multiprocessing.Lock在不同的进程之间共享一个计数器。 For example:例如:

import itertools as it
import multiprocessing

def func(x, val, lock):
    for i in range(x):
        i ** 2
    with lock:
        val.value += 1
        print('counter incremented to:', val.value)

if __name__ == '__main__':
    v = multiprocessing.Value('i', 0)
    lock = multiprocessing.Lock()

    with multiprocessing.Pool() as pool:
        pool.starmap(func, ((i, v, lock) for i in range(25)))
    print(counter.value())

This will throw the following exception:这将引发以下异常:

RuntimeError: Synchronized objects should only be shared between processes through inheritance RuntimeError:同步对象只能通过继承在进程之间共享

What I am most confused by is that a related (albeit not completely analogous) pattern works with multiprocessing.Process() :我最困惑的是一个相关的(虽然不是完全类似的)模式与multiprocessing.Process()

if __name__ == '__main__':
    v = multiprocessing.Value('i', 0)
    lock = multiprocessing.Lock()

    procs = [multiprocessing.Process(target=func, args=(i, v, lock))
             for i in range(25)]
    for p in procs: p.start()
    for p in procs: p.join()

Now, I recognize that these are two different markedly things:现在,我认识到这是两件明显不同的事情:

  • the first example uses a number of worker processes equal to cpu_count() , and splits an iterable range(25) between them第一个示例使用的工作进程数等于cpu_count() ,并在它们之间拆分可迭代range(25)
  • the second example creates 25 worker processes and tasks each with one input第二个示例创建了 25 个工作进程和任务,每个进程和任务都有一个输入

That said: how can I share an instance with pool.starmap() (or pool.map() ) in this manner?也就是说:如何以这种方式与pool.starmap() (或pool.map() )共享实例?

I've seen similar questions here , here , and here , but those approaches doesn't seem to be suited to .map() / .starmap() , regarldess of whether Value uses ctypes.c_int .我在这里这里这里看到过类似的问题,但这些方法似乎不适合.map() / .starmap() ,不管Value是否使用ctypes.c_int


I realize that this approach technically works:我意识到这种方法在技术上有效:

def func(x):
    for i in range(x):
        i ** 2
    with lock:
        v.value += 1
        print('counter incremented to:', v.value)

v = None
lock = None

def set_global_counter_and_lock():
    """Egh ... """
    global v, lock
    if not any((v, lock)):
        v = multiprocessing.Value('i', 0)
        lock = multiprocessing.Lock()

if __name__ == '__main__':
    # Each worker process will call `initializer()` when it starts.
    with multiprocessing.Pool(initializer=set_global_counter_and_lock) as pool:
        pool.map(func, range(25))

Is this really the best-practices way of going about this?这真的是解决此问题的最佳实践方式吗?

The RuntimeError you get when using Pool is because arguments for pool-methods are pickled before being send over a (pool-internal) queue to the worker processes.使用Pool时遇到的RuntimeError是因为池方法的参数在通过(池内部)队列发送到工作进程之前被腌制。 Which pool-method you are trying to use is irrelevant here.您尝试使用哪种池方法在这里无关紧要。 This doesn't happen when you just use Process because there is no queue involved.当您只使用Process时不会发生这种情况,因为不涉及队列。 You can reproduce the error just with pickle.dumps(multiprocessing.Value('i', 0)) .您可以仅使用pickle.dumps(multiprocessing.Value('i', 0))重现错误。

Your last code snippet doesn't work how you think it works.您的最后一个代码片段并不像您认为的那样工作。 You are not sharing a Value , you are recreating independent counters for every child process.您没有共享Value ,而是为每个子进程重新创建独立的计数器。

In case you were on Unix and used the default start-method "fork", you would be done with just not passing the shared objects as arguments into the pool-methods.如果您在 Unix 上并使用默认启动方法“fork”,您只需将共享对象作为参数传递到池方法中即可。 Your child-processes would inherit the globals through forking.您的子进程将通过分叉继承全局变量。 With process-start-methods "spawn" (default Windows and macOS with Python 3.8+ ) or "forkserver", you'll have to use the initializer during Pool instantiation, to let the child-processes inherit the shared objects.使用 process-start-methods “spawn”(默认 Windows 和macOS with Python 3.8+ )或“forkserver”,您必须在Pool实例化期间使用initializer ,让子进程继承共享对象。

Note, you don't need an extra multiprocessing.Lock here, because multiprocessing.Value comes by default with an internal one you can use.请注意,这里不需要额外的multiprocessing.Lock ,因为multiprocessing.Value默认带有一个您可以使用的内部。

import os
from multiprocessing import Pool, Value #, set_start_method


def func(x):
    for i in range(x):
        assert i == i
        with cnt.get_lock():
            cnt.value += 1
            print(f'{os.getpid()} | counter incremented to: {cnt.value}\n')


def init_globals(counter):
    global cnt
    cnt = counter


if __name__ == '__main__':

    # set_start_method('spawn')

    cnt = Value('i', 0)
    iterable = [10000 for _ in range(10)]

    with Pool(initializer=init_globals, initargs=(cnt,)) as pool:
        pool.map(func, iterable)

    assert cnt.value == 100000

Probably worth noting as well is that you don't need the counter to be shared in all cases.可能还值得注意的是,您不需要在所有情况下都共享计数器。 If you just need to keep track of how often something has happened in total, an option would be to keep separate worker-local counters during computation which you sum up at the end.如果您只需要跟踪某事总共发生的频率,一个选项是在计算期间保留单独的工作人员本地计数器,并在最后汇总。 This could result in a significant performance improvement for frequent counter updates for which you don't need synchronization during the parallel computation itself.对于在并行计算本身期间不需要同步的频繁计数器更新,这可能会显着提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM