简体   繁体   English

如何使许多perallel流程对单个共享NumPy数组进行更改?

[英]How can I make many perallel processes make changes to a single shared NumPy array?

I have scoured the internet for an answer, and nothing I can find applies to my situation. 我在互联网上搜寻答案,但找不到任何适合我的情况。 I have read about multiprocessing.Manager , have tried passing things back and forth, and none of it seems to play well with NumPy arrays.I ahve tried using Pool instead, but my target method does not return anything, it just makes changes to an array, so I wasn't sure how to set that up either. 我读过有关multiprocessing.Manager ,尝试过来回传递东西,但似乎都无法与NumPy数组一起使用。我尝试使用Pool代替,但是我的目标方法没有返回任何东西,它只是对数组,所以我不确定如何设置它。

Right Now I have: 现在我有:

def Multiprocess(self, sigmaI, sigmaX):
    cpus = mp.cpu_count()
    print('Number of cpu\'s to process WM: %d' % cpus)

    processes = [mp.Process(target = self.CreateMatrixMp, args = (sigmaI, sigmaX, i,)) for i in range(self.numPixels)]
    for p in processes:
        p.start()
    for p in processes:
        p.join()

The target function, CreateMatrixMp , takes the values passed, and after doing calculations, appends a value to an array data . 目标函数CreateMatrixMp接受传递的值,并在进行计算后将值附加到数组data This array is declared as self.data = numpy.zeros(self.size, numpy.float64) . 此数组声明为self.data = numpy.zeros(self.size, numpy.float64) If the details of the CreateMatrixMp method would help, I can post that as well. 如果CreateMatrixMp方法的详细信息有帮助,我也可以发布它。

I tried adding this above where the processes are run: 我尝试在运行过程的上方添加以下内容:

mgr = mp.Manager()
sharedData = mgr.Array(ctypes.c_numpy.float64, self.data)

and then passing sharedData to CreateMatrixMp , where it can be modified. 然后将sharedData传递到CreateMatrixMp ,在这里可以对其进行修改。 Once all the processes have run and the array is complete, I simply do self.data = sharedData . 一旦所有进程都运行并且数组完成,我就简单地执行self.data = sharedData

But this doesn't work (though I know I am not setting it up correctly). 但这是行不通的(尽管我知道我没有正确设置它)。 How should this be done with a NumPy array? 如何使用NumPy数组完成此操作? I want each and every process (there will be thousands of them) to append to the same array. 我希望每个进程(将有成千上万个)附加到同一数组。

Any help is enormously appreciated. 任何帮助深表感谢。

Welcome to the dark world of multiple threads. 欢迎来到多线程的黑暗世界。 I think your big problem here is the mgr.Array puts synchronisation around the array. 我认为您的最大问题是mgr.Array围绕数组进行同步。 If you generate data quickly this will be a bottle-neck since processes will be waiting for the last to finish with the array. 如果您快速生成数据,这将成为瓶颈,因为进程将等待最后一个数组完成。 It is more efficient and will help if each process keeps a private copy of the nump array. 如果每个进程都保留该nump数组的私有副本,它将更加高效,并且将有所帮助。 Once you have fed in all the data then wait for all the processes to complete. 输入所有数据后,请等待所有过程完成。 Then you can combine all the arrays into self.data. 然后,您可以将所有数组组合到self.data中。 This way none of the processed need wait on a shared resource. 这样,所有已处理的资源都无需等待共享资源。 Neither this solution, nor yours, guarantee the order of the output list. 此解决方案或您的解决方案都不能保证输出列表的顺序。 I suspect from self.numPixels that order may be important. 我从self.numPixels怀疑顺序可能很重要。 Many solutions, but the easiest is to feed in order with the data and do a self.data.sort(...) after all is done. 解决方案很多,但最简单的方法是按顺序输入数据并在完成后执行self.data.sort(...)。 Alternatively and faster, pre-create self.data and have the processes poke results in the correct location. 另一种方法是,更快地预先创建self.data并使进程在正确的位置戳结果。 self.data does not need to be a shared data structure since the processes are never changing anything in common. self.data不必是共享的数据结构,因为进程永远不会改变任何共同点。 This works if arrays map to C-like arrays. 如果数组映射到类似C的数组,这将起作用。 It will not work for linked lists, etc. Hope this helps. 它不适用于链接列表等。希望这会有所帮助。 Ask if you want more details. 询问是否需要更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM