简体   繁体   English

如何在python多处理中实现reduce操作?

[英]How to implement a reduce operation in python multiprocessing?

I'm an expert parallel programmer in OpenMP and C++. 我是OpenMP和C ++的专家并行程序员。 Now I'm trying to understand parallelism in python and the multiprocessing library. 现在,我试图理解python和multiprocessing库中的并行性。

In particular, I'm trying to parallelize this simple code, which randomly increment an array for 100 times: 特别是,我尝试并行处理此简单代码,该代码将数组随机递增100次:

from random import randint
import multiprocessing as mp
import numpy as np

def random_add(x):
    x[randint(0,len(x)-1)]  += 1

if __name__ == "__main__":
    print("Serial")
    x = np.zeros(8)
    for i in range(100):
        random_add(x)
    print(x)

    print("Parallel")
    x = np.zeros(8)    
    processes = [mp.Process(target = random_add, args=(x,)) for i in range(100)]
    for p in processes:
        p.start()
    print(x)

However,this is the following output: 但是,这是以下输出:

Serial
[  9.  18.  11.  15.  16.   8.  10.  13.]
Parallel
[ 0.  0.  0.  0.  0.  0.  0.  0.]

Why this happens? 为什么会这样? Well, I think I have an explanation: since we are in multiprocessing (and not multi-threading), each process as his own section of memory, ie, each spawned process has his own x , which is destroyed once random_add(x) is terminated. 好吧,我想我有一个解释:由于我们处于多处理(而不是多线程)中,因此每个进程都属于自己的内存部分,即每个产生的进程都有自己的x ,一旦random_add(x)为终止。 As conclusion, the x in the main program is never really updated. 结论是,主程序中的x从未真正更新过。

Is this correct? 这个对吗? And if so, how can I solve this problem? 如果是这样,我该如何解决这个问题? In a few words, I need a global reduce operation which sum the results of all the random_add calls, obtaining the same result of the serial version. 简而言之,我需要一个全局的reduce操作,该操作求和所有random_add调用的结果,以获得与串行版本相同的结果。

You should use shared memory objects in your case: 在这种情况下,应使用共享内存对象:

from random import randint
import multiprocessing as mp

def random_add(x):
    x[randint(0,len(x)-1)]  += 1

if __name__ == "__main__":
    print("Serial")
    x = [0]*8
    for i in range(100):
        random_add(x)
    print(x)

    print("Parallel")
    x = mp.Array('i', range(8))
    processes = [mp.Process(target = random_add, args=(x,)) for i in range(100)]
    for p in processes:
        p.start()
    print(x[:])

I've changed numpy array to ordinal list for the purpose of clearness of code 为了清楚起见,我将numpy数组更改为顺序列表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM