[英]Multiprocessing pool: How to call an arbitrary list of methods on a list of class objects
A cleaned up version of the code including the solution to the problem (thanks @JohanL!) can be found as a Gist on GitHub . 清理版本的代码包括问题的解决方案 (感谢@JohanL!)可以在GitHub上找到Gist 。
The following code snipped (CPython 3.[4,5,6]) illustrates my intention (as well as my problem): 以下代码剪辑(CPython 3. [4,5,6])说明了我的意图(以及我的问题):
from functools import partial
import multiprocessing
from pprint import pprint as pp
NUM_CORES = multiprocessing.cpu_count()
class some_class:
some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
def run_routines_on_objects_in_parallel_and_return(in_object_list, routine_list):
func_handle = partial(__run_routines_on_object_and_return__, routine_list)
with multiprocessing.Pool(processes = NUM_CORES) as p:
out_object_list = list(p.imap_unordered(
func_handle,
(in_object for in_object in in_object_list)
))
return out_object_list
def __run_routines_on_object_and_return__(routine_list, in_object):
for routine_name in routine_list:
getattr(in_object, routine_name)()
return in_object
object_list = [some_class() for item in range(20)]
pp([item.some_dict for item in object_list])
new_object_list = run_routines_on_objects_in_parallel_and_return(
object_list,
['some_routine', 'some_other_routine']
)
pp([item.some_dict for item in new_object_list])
verification_object_list = [
__run_routines_on_object_and_return__(
['some_routine', 'some_other_routine'],
item
) for item in object_list
]
pp([item.some_dict for item in verification_object_list])
I am working with a list of objects of type some_class
. 我正在使用some_class
类型的对象列表。 some_class
has a property, a dictionary, named some_dict
and a few methods, which can modify the dict ( some_routine
and some_other_routine
). some_class
有一个属性,一个字典,名为some_dict
和一些方法,可以修改dict( some_routine
和some_other_routine
)。 Sometimes, I want to call a sequence of methods on all the objects in the list. 有时,我想在列表中的所有对象上调用一系列方法。 Because this is computationally intensive, I intend to distribute the objects over multiple CPU cores (using multiprocessing.Pool
and imap_unordered
- the list order does not matter). 因为这是计算密集型的,我打算在多个CPU核心上分配对象(使用multiprocessing.Pool
和imap_unordered
- 列表顺序无关紧要)。
The routine __run_routines_on_object_and_return__
takes care of calling the list of methods on one individual object. 例程__run_routines_on_object_and_return__
负责调用一个单独对象的方法列表。 From what I can tell, this is working just fine. 据我所知,这工作得很好。 I am using functools.partial
for simplifying the structure of the code a bit - the multiprocessing pool therefore has to handle the list of objects as an input parameter only. 我使用functools.partial
来简化代码的结构 - 多处理池因此必须仅将对象列表作为输入参数处理。
The problem is ... it does not work. 问题是......它不起作用。 The objects contained in the list returned by imap_unordered
are identical to the objects I fed into it. imap_unordered
返回的列表中包含的对象与我输入的对象相同。 The dictionaries within the objects look just like before. 对象中的字典看起来就像以前一样。 I have used similar mechanisms for working on lists of dictionaries directly without a glitch, so I somehow suspect that there is something wrong with modifying an object property which happens to be a dictionary. 我已经使用类似的机制直接处理字典列表没有故障,所以我不知何故怀疑修改一个碰巧是字典的对象属性有问题。
In my example, verification_object_list
contains the correct result (though it is generated in a single process/thread). 在我的示例中, verification_object_list
包含正确的结果(尽管它是在单个进程/线程中生成的)。 new_object_list
is identical to object_list
, which should not be the case. new_object_list
与object_list
相同,不应该是这种情况。
What am I doing wrong? 我究竟做错了什么?
EDIT 编辑
I found the following question , which has an actually working and applicable answer . 我发现了以下问题 ,其中有一个实际可行且适用的答案 。 I modified it a bit following my idea of calling a list of methods on every object and it works: 我根据我在每个对象上调用方法列表的想法修改了一下它的工作原理:
import random
from multiprocessing import Pool, Manager
class Tester(object):
def __init__(self, num=0.0, name='none'):
self.num = num
self.name = name
def modify_me(self):
self.num += random.normalvariate(mu=0, sigma=1)
self.name = 'pla' + str(int(self.num * 100))
def __repr__(self):
return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)
def init(L):
global tests
tests = L
def modify(i_t_nn):
i, t, nn = i_t_nn
for method_name in nn:
getattr(t, method_name)()
tests[i] = t # copy back
return i
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
tests = manager.list([Tester(num=i) for i in range(num)])
print(tests[:2])
args = ((i, t, ['modify_me']) for i, t in enumerate(tests))
pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
for i in pool.imap_unordered(modify, args):
print("done %d" % i)
pool.close()
pool.join()
print(tests[:2])
if __name__ == '__main__':
main()
Now, I went a bit further and introduced my original some_class
into the game, which contains a the described dictionary property some_dict
. 现在,我更进一步,将我原来的some_class
引入游戏,其中包含描述的字典属性some_dict
。 It does NOT work: 这是行不通的:
import random
from multiprocessing import Pool, Manager
from pprint import pformat as pf
class some_class:
some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
def __repr__(self):
return pf(self.some_dict)
def init(L):
global tests
tests = L
def modify(i_t_nn):
i, t, nn = i_t_nn
for method_name in nn:
getattr(t, method_name)()
tests[i] = t # copy back
return i
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
tests = manager.list([some_class() for i in range(num)])
print(tests[:2])
args = ((i, t, ['some_routine', 'some_other_routine']) for i, t in enumerate(tests))
pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
for i in pool.imap_unordered(modify, args):
print("done %d" % i)
pool.close()
pool.join()
print(tests[:2])
if __name__ == '__main__':
main()
The diff between working and not working is really small, but I still do not get it: 工作和不工作之间的差别很小,但我仍然没有得到它:
diff --git a/test.py b/test.py
index b12eb56..0aa6def 100644
--- a/test.py
+++ b/test.py
@@ -1,15 +1,15 @@
import random
from multiprocessing import Pool, Manager
+from pprint import pformat as pf
-class Tester(object):
- def __init__(self, num=0.0, name='none'):
- self.num = num
- self.name = name
- def modify_me(self):
- self.num += random.normalvariate(mu=0, sigma=1)
- self.name = 'pla' + str(int(self.num * 100))
+class some_class:
+ some_dict = {'some_key': None, 'some_other_key': None}
+ def some_routine(self):
+ self.some_dict.update({'some_key': 'some_value'})
+ def some_other_routine(self):
+ self.some_dict.update({'some_other_key': 77})
def __repr__(self):
- return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)
+ return pf(self.some_dict)
def init(L):
global tests
@@ -25,10 +25,10 @@ def modify(i_t_nn):
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
- tests = manager.list([Tester(num=i) for i in range(num)])
+ tests = manager.list([some_class() for i in range(num)])
print(tests[:2])
- args = ((i, t, ['modify_me']) for i, t in enumerate(tests))
+ args = ((i, t, ['some_routine', 'some_other_routine']) for i, t in enumerate(tests))
What is happening here? 这里发生了什么?
Your problem is due to two things; 你的问题是由两件事造成的; namely that you are using a class variable and that you are running your code in different processes. 即您正在使用类变量,并且您正在不同的进程中运行您的代码。
Since different processes do not share memory, all objects and parameters must be pickled and sent from the original process to the process that executes it. 由于不同的进程不共享内存,因此必须对所有对象和参数进行pickle并将其从原始进程发送到执行它的进程。 When the parameter is an object, its class is not sent with it. 当参数是一个对象时,它的类不会随之发送。 Instead the receiving process uses its own blueprint (ie class
). 相反,接收过程使用自己的蓝图(即class
)。
In your current code, you pass the object as a parameter, update it and return it. 在当前代码中,将对象作为参数传递,更新并返回它。 However, the updates are not made to the object, but rather to the class itself, since you are updating a class variable. 但是,更新不是针对对象,而是针对类本身,因为您正在更新类变量。 However, this update is not sent back to your main process, and therefore you are left with your not updated class. 但是,此更新不会发送回您的主进程,因此您将保留未更新的类。
What you want to do, is to make some_dict
a part of your object, rather than of your class. 你想做的是让some_dict
成为你对象的一部分,而不是你的类。 This is easily done by an __init__()
method. 这可以通过__init__()
方法轻松完成。 Thus modify some_class
as: 因此修改some_class
为:
class some_class:
def __init__(self):
self.some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
This will make your program work as you intend it to. 这将使您的程序按照您的意图运行。 You almost always want to setup your object in an __init__()
call, rather than as class variables, since in the latter case the data will be shared between all instances (and can be updated by all). 您几乎总是希望在__init__()
调用中设置对象,而不是作为类变量,因为在后一种情况下,数据将在所有实例之间共享(并且可以由所有实例更新)。 That is not normally what you want, when you encapsulate data and state in an object of a class. 当您将数据和状态封装在类的对象中时,这通常不是您想要的。
EDIT: It seems I was mistaken in whether the class
is sent with the pickled object. 编辑:似乎我错误的是class
是否与腌制对象一起发送。 After further inspection of what happens, I think also the class
itself, with its class variables are pickled. 在进一步检查发生了什么后,我认为class
本身,其类变量被腌制。 Since, if the class variable is updated before sending the object to the new process, the updated value is available. 因为,如果在将对象发送到新进程之前更新了类变量,则可以使用更新的值。 However it is still the case that the updates done in the new process are not relayed back to the original class
. 但是 ,仍然存在新进程中完成的更新未中继回原始class
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.