简体   繁体   English

并行化此Python结构的正确方法

[英]Proper way to parallelize this Python structure

I'm writing some code in Python and hope that I can speed it up by sharing it between different processes. 我正在用Python编写一些代码,希望我可以通过在不同进程之间共享来加快速度。 I have the following program substructure in Python 3.5 on Linux: 我在Linux上的Python 3.5中具有以下程序子结构:

class Foo(fromMother):

   def __init__(self, size):
      fromMother.__init__()
      self.data = self.__createData(size)
      self.properGuess = 0

   def __createData(self, size):
      """Returns some dict of n x m numpy arrays."""

   def doStuff(self, level):
      """Calculates CPU-heavy, fully serial operation.
        The proper guess is some sort of start value."""
      a, b = foo(level, self.properGuess)
      self.properGuess = a
      return b



if __name__ == "__main__":
   foo = Foo(someSize)
   array = []
   for i in range(manyValues):
       array.append(foo.doStuff(i))

This is very roughly what my code does and I would like to find a nice way to introduce some parallel structure to use multiple cores. 这是我的代码所做的大致工作,我想找到一种不错的方法来引入一些并行结构以使用多个内核。 doStuff() can not be parallelized and is some kind of optimization problem, hence the starting value. doStuff()无法并行化,是某种优化问题,因此是起始值。 It is not essential that this is the absolutely latest value, the function still gets executed fairly well at step i with a starting point calculated at i - 10, so it's completely okay to update this value 'every now and then'. 这绝对不是绝对最新的值,该函数在步骤i处以在i-10处计算的起始点仍然可以很好地执行,因此完全可以“不时地”更新该值。 But it can't be left out, the value of step i+1 is close to i and I know that the problem is not globally but only locally convex. 但是不能排除,步骤i + 1的值接近于i,我知道问题不在于全局,而仅在于局部凸。 Staying that close has given me quite stable results. 保持如此密切的关系给了我相当稳定的结果。

Unfortunately, breaking up the inheritance structure would be quite annoying - it's possible but should be the last resort. 不幸的是,破坏继承结构会很烦人-有可能,但应该是最后的手段。 I wanted to use Python's own multiprocessing library but I'm a little confused with the shared memory etc, since this is not simply sharing some sort of array but having it in an object and using methods on it. 我想使用Python自己的多处理库,但是我对共享内存等感到有些困惑,因为这不仅是共享某种数组,而是在对象中使用它并使用方法。 I have only done parallel programming on MATLAB and this just on a very basic level. 我只在MATLAB上完成了并行编程,而这只是一个非常基本的水平。

Could someone point me a way to this. 有人可以指出我的解决方法。 I'm thinking about this for two days already and I'm not sure how to achieve this nicely without breaking and "uglyfying" my entire structure. 我已经考虑了两天了,我不确定如何在不破坏和“丑化”整个结构的情况下很好地实现这一点。

You can use a process pool 您可以使用进程池

from multiprocessing import Pool

if __name__ == "__main__":
   foo = Foo(someSize)
   p = Pool(5)
   results = p.map(foo.doStuff, range(manyValues))

But, you need to think about some things first, is your class sharable?, will it mutate?, etc... 但是,您需要首先考虑一些事情,您的班级是否可共享?它会变异吗?,等等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM