简体   繁体   English

多处理池:如何在类对象列表上调用任意方法列表

[英]Multiprocessing pool: How to call an arbitrary list of methods on a list of class objects

A cleaned up version of the code including the solution to the problem (thanks @JohanL!) can be found as a Gist on GitHub . 清理版本的代码包括问题的解决方案 (感谢@JohanL!)可以在GitHub上找到Gist


The following code snipped (CPython 3.[4,5,6]) illustrates my intention (as well as my problem): 以下代码剪辑(CPython 3. [4,5,6])说明了我的意图(以及我的问题):

from functools import partial
import multiprocessing
from pprint import pprint as pp

NUM_CORES = multiprocessing.cpu_count()

class some_class:
    some_dict = {'some_key': None, 'some_other_key': None}
    def some_routine(self):
        self.some_dict.update({'some_key': 'some_value'})
    def some_other_routine(self):
        self.some_dict.update({'some_other_key': 77})

def run_routines_on_objects_in_parallel_and_return(in_object_list, routine_list):
    func_handle = partial(__run_routines_on_object_and_return__, routine_list)
    with multiprocessing.Pool(processes = NUM_CORES) as p:
        out_object_list = list(p.imap_unordered(
            func_handle,
            (in_object for in_object in in_object_list)
            ))
    return out_object_list

def __run_routines_on_object_and_return__(routine_list, in_object):
    for routine_name in routine_list:
        getattr(in_object, routine_name)()
    return in_object

object_list = [some_class() for item in range(20)]
pp([item.some_dict for item in object_list])

new_object_list = run_routines_on_objects_in_parallel_and_return(
        object_list,
        ['some_routine', 'some_other_routine']
        )
pp([item.some_dict for item in new_object_list])

verification_object_list = [
    __run_routines_on_object_and_return__(
        ['some_routine', 'some_other_routine'],
        item
        ) for item in object_list
    ]
pp([item.some_dict for item in verification_object_list])

I am working with a list of objects of type some_class . 我正在使用some_class类型的对象列表。 some_class has a property, a dictionary, named some_dict and a few methods, which can modify the dict ( some_routine and some_other_routine ). some_class有一个属性,一个字典,名为some_dict和一些方法,可以修改dict( some_routinesome_other_routine )。 Sometimes, I want to call a sequence of methods on all the objects in the list. 有时,我想在列表中的所有对象上调用一系列方法。 Because this is computationally intensive, I intend to distribute the objects over multiple CPU cores (using multiprocessing.Pool and imap_unordered - the list order does not matter). 因为这是计算密集型的,我打算在多个CPU核心上分配对象(使用multiprocessing.Poolimap_unordered - 列表顺序无关紧要)。

The routine __run_routines_on_object_and_return__ takes care of calling the list of methods on one individual object. 例程__run_routines_on_object_and_return__负责调用一个单独对象的方法列表。 From what I can tell, this is working just fine. 据我所知,这工作得很好。 I am using functools.partial for simplifying the structure of the code a bit - the multiprocessing pool therefore has to handle the list of objects as an input parameter only. 我使用functools.partial来简化代码的结构 - 多处理池因此必须仅将对象列表作为输入参数处理。

The problem is ... it does not work. 问题是......它不起作用。 The objects contained in the list returned by imap_unordered are identical to the objects I fed into it. imap_unordered返回的列表中包含的对象与我输入的对象相同。 The dictionaries within the objects look just like before. 对象中的字典看起来就像以前一样。 I have used similar mechanisms for working on lists of dictionaries directly without a glitch, so I somehow suspect that there is something wrong with modifying an object property which happens to be a dictionary. 我已经使用类似的机制直接处理字典列表没有故障,所以我不知何故怀疑修改一个碰巧是字典的对象属性有问题。

In my example, verification_object_list contains the correct result (though it is generated in a single process/thread). 在我的示例中, verification_object_list包含正确的结果(尽管它是在单个进程/线程中生成的)。 new_object_list is identical to object_list , which should not be the case. new_object_listobject_list相同,不应该是这种情况。

What am I doing wrong? 我究竟做错了什么?


EDIT 编辑

I found the following question , which has an actually working and applicable answer . 我发现了以下问题 ,其中有一个实际可行且适用的答案 I modified it a bit following my idea of calling a list of methods on every object and it works: 我根据我在每个对象上调用方法列表的想法修改了一下它的工作原理:

import random
from multiprocessing import Pool, Manager

class Tester(object):
    def __init__(self, num=0.0, name='none'):
        self.num  = num
        self.name = name
    def modify_me(self):
        self.num += random.normalvariate(mu=0, sigma=1)
        self.name = 'pla' + str(int(self.num * 100))
    def __repr__(self):
        return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)

def init(L):
    global tests
    tests = L

def modify(i_t_nn):
    i, t, nn = i_t_nn
    for method_name in nn:
        getattr(t, method_name)()
    tests[i] = t # copy back
    return i

def main():
    num_processes = num = 10 #note: num_processes and num may differ
    manager = Manager()
    tests = manager.list([Tester(num=i) for i in range(num)])
    print(tests[:2])

    args = ((i, t, ['modify_me']) for i, t in enumerate(tests))
    pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
    for i in pool.imap_unordered(modify, args):
        print("done %d" % i)
    pool.close()
    pool.join()
    print(tests[:2])

if __name__ == '__main__':
    main()

Now, I went a bit further and introduced my original some_class into the game, which contains a the described dictionary property some_dict . 现在,我更进一步,将我原来的some_class引入游戏,其中包含描述的字典属性some_dict It does NOT work: 这是行不通的:

import random
from multiprocessing import Pool, Manager
from pprint import pformat as pf

class some_class:
    some_dict = {'some_key': None, 'some_other_key': None}
    def some_routine(self):
        self.some_dict.update({'some_key': 'some_value'})
    def some_other_routine(self):
        self.some_dict.update({'some_other_key': 77})
    def __repr__(self):
        return pf(self.some_dict)

def init(L):
    global tests
    tests = L

def modify(i_t_nn):
    i, t, nn = i_t_nn
    for method_name in nn:
        getattr(t, method_name)()
    tests[i] = t # copy back
    return i

def main():
    num_processes = num = 10 #note: num_processes and num may differ
    manager = Manager()
    tests = manager.list([some_class() for i in range(num)])
    print(tests[:2])

    args = ((i, t, ['some_routine', 'some_other_routine']) for i, t in enumerate(tests))
    pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
    for i in pool.imap_unordered(modify, args):
        print("done %d" % i)
    pool.close()
    pool.join()
    print(tests[:2])

if __name__ == '__main__':
    main()

The diff between working and not working is really small, but I still do not get it: 工作和不工作之间的差别很小,但我仍然没有得到它:

diff --git a/test.py b/test.py
index b12eb56..0aa6def 100644
--- a/test.py
+++ b/test.py
@@ -1,15 +1,15 @@
 import random
 from multiprocessing import Pool, Manager
+from pprint import pformat as pf

-class Tester(object):
-       def __init__(self, num=0.0, name='none'):
-               self.num  = num
-               self.name = name
-       def modify_me(self):
-               self.num += random.normalvariate(mu=0, sigma=1)
-               self.name = 'pla' + str(int(self.num * 100))
+class some_class:
+       some_dict = {'some_key': None, 'some_other_key': None}
+       def some_routine(self):
+               self.some_dict.update({'some_key': 'some_value'})
+       def some_other_routine(self):
+               self.some_dict.update({'some_other_key': 77})
        def __repr__(self):
-               return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)
+               return pf(self.some_dict)

 def init(L):
        global tests
@@ -25,10 +25,10 @@ def modify(i_t_nn):
 def main():
        num_processes = num = 10 #note: num_processes and num may differ
        manager = Manager()
-       tests = manager.list([Tester(num=i) for i in range(num)])
+       tests = manager.list([some_class() for i in range(num)])
        print(tests[:2])

-       args = ((i, t, ['modify_me']) for i, t in enumerate(tests))
+       args = ((i, t, ['some_routine', 'some_other_routine']) for i, t in enumerate(tests))

What is happening here? 这里发生了什么?

Your problem is due to two things; 你的问题是由两件事造成的; namely that you are using a class variable and that you are running your code in different processes. 即您正在使用类变量,并且您正在不同的进程中运行您的代码。

Since different processes do not share memory, all objects and parameters must be pickled and sent from the original process to the process that executes it. 由于不同的进程不共享内存,因此必须对所有对象和参数进行pickle并将其从原始进程发送到执行它的进程。 When the parameter is an object, its class is not sent with it. 当参数是一个对象时,它的类不会随之发送。 Instead the receiving process uses its own blueprint (ie class ). 相反,接收过程使用自己的蓝图(即class )。

In your current code, you pass the object as a parameter, update it and return it. 在当前代码中,将对象作为参数传递,更新并返回它。 However, the updates are not made to the object, but rather to the class itself, since you are updating a class variable. 但是,更新不是针对对象,而是针对类本身,因为您正在更新类变量。 However, this update is not sent back to your main process, and therefore you are left with your not updated class. 但是,此更新不会发送回您的主进程,因此您将保留未更新的类。

What you want to do, is to make some_dict a part of your object, rather than of your class. 想做的是让some_dict成为你对象的一部分,而不是你的类。 This is easily done by an __init__() method. 这可以通过__init__()方法轻松完成。 Thus modify some_class as: 因此修改some_class为:

class some_class:
    def __init__(self):
        self.some_dict = {'some_key': None, 'some_other_key': None}
    def some_routine(self):
        self.some_dict.update({'some_key': 'some_value'})
    def some_other_routine(self):
        self.some_dict.update({'some_other_key': 77})

This will make your program work as you intend it to. 这将使您的程序按照您的意图运行。 You almost always want to setup your object in an __init__() call, rather than as class variables, since in the latter case the data will be shared between all instances (and can be updated by all). 您几乎总是希望在__init__()调用中设置对象,而不是作为类变量,因为在后一种情况下,数据将在所有实例之间共享(并且可以由所有实例更新)。 That is not normally what you want, when you encapsulate data and state in an object of a class. 当您将数据和状态封装在类的对象中时,这通常不是您想要的。

EDIT: It seems I was mistaken in whether the class is sent with the pickled object. 编辑:似乎我错误的是class是否与腌制对象一起发送。 After further inspection of what happens, I think also the class itself, with its class variables are pickled. 在进一步检查发生了什么后,我认为class本身,其类变量被腌制。 Since, if the class variable is updated before sending the object to the new process, the updated value is available. 因为,如果在将对象发送到新进程之前更新了类变量,则可以使用更新的值。 However it is still the case that the updates done in the new process are not relayed back to the original class . 但是 ,仍然存在新进程中完成的更新未中继回原始class

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用map()在对象列表上调用类方法 - How to use map() to call class methods on a list of objects 将类实例列表传递给 multiprocessing.Pool 并并行化每个类对象的方法函数的运行 - Passing a list of class instances to multiprocessing.Pool and parallelize the running of method function of each class objects 如何列出多处理池启动的进程? - How to list Processes started by multiprocessing Pool? 从类对象列表调用类方法的更好方法 - Python - A better way to call Class Methods from List of Class Objects - Python 大对象列表上的多处理 Pool.map() 缩放不佳:如何在 python 中实现更好的并行缩放? - Poor scaling of multiprocessing Pool.map() on a list of large objects: How to achieve better parallel scaling in python? 如何在列表理解内调用类方法 - how to call class methods inside list comprehension Python多处理 - 将类方法应用于对象列表 - Python Multiprocessing - apply class method to a list of objects Python多处理池 - 迭代对象方法? - Python multiprocessing pool - iterating over objects methods? python 中的 class 内的多处理池将数组更改为列表 - Multiprocessing pool inside a class in python changing array to list 多处理:如何在列表上使用pool.map并使用参数函数? - Multiprocessing: How to use pool.map on a list and function with arguments?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM