[英]How to have dedicated variable for multiprocessing worker, which keeps its value between calls?
I have the following code:我有以下代码:
pool = Pool(cpu_count())
pool.imap(process_item, items, chunksize=100)
In the process_item()
function I am using structures which are resource demanding to create, but it would be reusable.在process_item()
函数中,我使用的结构需要创建资源,但它是可重用的。 (but not concurrently shareable) Currently within each call of process_item()
it creates the resource in a local variable repeatedly. (但不能同时共享)当前在每次调用process_item()
时,它会在局部变量中重复创建资源。 It would be great performance benefit to create once (for each worker) then reuse创建一次(为每个工作人员)然后重用将是巨大的性能优势
Question问题
How to have delegated cpu_count()
instances for those resource, and how to implement the process_item()
function to access the appropriate delegated instance belonging that particular worker?如何为这些资源委托cpu_count()
实例,以及如何实现process_item()
函数来访问属于该特定工作人员的适当委托实例?
If you cannot use anything outside the standard library, I would suggest using using an initializer
when creating the pool:如果你不能使用标准库之外的任何东西,我建议在创建池时使用initializer
:
from multiprocessing import Pool, Manager, Process
import os
import random
class A:
def __init__(self):
self.var = random.randint(0, 1000)
def get(self):
print(self.var, os.getpid())
def worker(some_arg):
global expensive_var
expensive_var.get()
def initializer(*args):
global expensive_var
expensive_var = A()
if __name__ == "__main__":
pool = Pool(8, initializer=initializer, initargs=())
for result in pool.imap(worker, range(100)):
continue
Create your local variables inside the initializer
, and make them global.在initializer
中创建局部变量,并使它们成为全局变量。 Then you can use them inside the function you are passing to the pool.然后,您可以在传递给池的函数中使用它们。 This works because the initializer
is executed in when each process of the pool starts.这是因为initializer
是在池的每个进程启动时执行的。 So making them global
would make it a global variable in the scope of the child process only, allowing access to it during execution of the function you passed to the pool.因此,将它们设为global
将使其成为仅在子进程范围内的全局变量,允许在执行您传递给池的函数期间访问它。
There was a stackoverflow answer that explained all this better, but I can't seem to find it for now.有一个 stackoverflow 答案可以更好地解释这一切,但我现在似乎找不到它。 But this is basically the gist of it.但这基本上是它的要点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.