[英]Sharing a counter with multiprocessing.Pool
I'd like to use multiprocessing.Value
+ multiprocessing.Lock
to share a counter between separate processes.我想使用
multiprocessing.Value
+ multiprocessing.Lock
在不同的进程之间共享一个计数器。 For example:例如:
import itertools as it
import multiprocessing
def func(x, val, lock):
for i in range(x):
i ** 2
with lock:
val.value += 1
print('counter incremented to:', val.value)
if __name__ == '__main__':
v = multiprocessing.Value('i', 0)
lock = multiprocessing.Lock()
with multiprocessing.Pool() as pool:
pool.starmap(func, ((i, v, lock) for i in range(25)))
print(counter.value())
This will throw the following exception:这将引发以下异常:
RuntimeError: Synchronized objects should only be shared between processes through inheritance
RuntimeError:同步对象只能通过继承在进程之间共享
What I am most confused by is that a related (albeit not completely analogous) pattern works with multiprocessing.Process()
:我最困惑的是一个相关的(虽然不是完全类似的)模式与
multiprocessing.Process()
:
if __name__ == '__main__':
v = multiprocessing.Value('i', 0)
lock = multiprocessing.Lock()
procs = [multiprocessing.Process(target=func, args=(i, v, lock))
for i in range(25)]
for p in procs: p.start()
for p in procs: p.join()
Now, I recognize that these are two different markedly things:现在,我认识到这是两件明显不同的事情:
cpu_count()
, and splits an iterable range(25)
between themcpu_count()
,并在它们之间拆分可迭代range(25)
That said: how can I share an instance with pool.starmap()
(or pool.map()
) in this manner?也就是说:如何以这种方式与
pool.starmap()
(或pool.map()
)共享实例?
I've seen similar questions here , here , and here , but those approaches doesn't seem to be suited to .map()
/ .starmap()
, regarldess of whether Value
uses ctypes.c_int
.我在这里,这里和这里看到过类似的问题,但这些方法似乎不适合
.map()
/ .starmap()
,不管Value
是否使用ctypes.c_int
。
I realize that this approach technically works:我意识到这种方法在技术上有效:
def func(x):
for i in range(x):
i ** 2
with lock:
v.value += 1
print('counter incremented to:', v.value)
v = None
lock = None
def set_global_counter_and_lock():
"""Egh ... """
global v, lock
if not any((v, lock)):
v = multiprocessing.Value('i', 0)
lock = multiprocessing.Lock()
if __name__ == '__main__':
# Each worker process will call `initializer()` when it starts.
with multiprocessing.Pool(initializer=set_global_counter_and_lock) as pool:
pool.map(func, range(25))
Is this really the best-practices way of going about this?这真的是解决此问题的最佳实践方式吗?
The RuntimeError
you get when using Pool
is because arguments for pool-methods are pickled before being send over a (pool-internal) queue to the worker processes.使用
Pool
时遇到的RuntimeError
是因为池方法的参数在通过(池内部)队列发送到工作进程之前被腌制。 Which pool-method you are trying to use is irrelevant here.您尝试使用哪种池方法在这里无关紧要。 This doesn't happen when you just use
Process
because there is no queue involved.当您只使用
Process
时不会发生这种情况,因为不涉及队列。 You can reproduce the error just with pickle.dumps(multiprocessing.Value('i', 0))
.您可以仅使用
pickle.dumps(multiprocessing.Value('i', 0))
重现错误。
Your last code snippet doesn't work how you think it works.您的最后一个代码片段并不像您认为的那样工作。 You are not sharing a
Value
, you are recreating independent counters for every child process.您没有共享
Value
,而是为每个子进程重新创建独立的计数器。
In case you were on Unix and used the default start-method "fork", you would be done with just not passing the shared objects as arguments into the pool-methods.如果您在 Unix 上并使用默认启动方法“fork”,您只需不将共享对象作为参数传递到池方法中即可。 Your child-processes would inherit the globals through forking.
您的子进程将通过分叉继承全局变量。 With process-start-methods "spawn" (default Windows and macOS with Python 3.8+ ) or "forkserver", you'll have to use the
initializer
during Pool
instantiation, to let the child-processes inherit the shared objects.使用 process-start-methods “spawn”(默认 Windows 和macOS with Python 3.8+ )或“forkserver”,您必须在
Pool
实例化期间使用initializer
,让子进程继承共享对象。
Note, you don't need an extra multiprocessing.Lock
here, because multiprocessing.Value
comes by default with an internal one you can use.请注意,这里不需要额外的
multiprocessing.Lock
,因为multiprocessing.Value
默认带有一个您可以使用的内部。
import os
from multiprocessing import Pool, Value #, set_start_method
def func(x):
for i in range(x):
assert i == i
with cnt.get_lock():
cnt.value += 1
print(f'{os.getpid()} | counter incremented to: {cnt.value}\n')
def init_globals(counter):
global cnt
cnt = counter
if __name__ == '__main__':
# set_start_method('spawn')
cnt = Value('i', 0)
iterable = [10000 for _ in range(10)]
with Pool(initializer=init_globals, initargs=(cnt,)) as pool:
pool.map(func, iterable)
assert cnt.value == 100000
Probably worth noting as well is that you don't need the counter to be shared in all cases.可能还值得注意的是,您不需要在所有情况下都共享计数器。 If you just need to keep track of how often something has happened in total, an option would be to keep separate worker-local counters during computation which you sum up at the end.
如果您只需要跟踪某事总共发生的频率,一个选项是在计算期间保留单独的工作人员本地计数器,并在最后汇总。 This could result in a significant performance improvement for frequent counter updates for which you don't need synchronization during the parallel computation itself.
对于在并行计算本身期间不需要同步的频繁计数器更新,这可能会显着提高性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.