Python-为2D蒙版数组并行化python循环？

Question

Probably a commonplace question, but how can I parallelize this loop in Python? 可能是一个平常的问题，但是如何在Python中并行化此循环？

for i in range(0,Nx.shape[2]):
  for j in range(0,Nx.shape[2]):
    NI=Nx[:,:,i]; NJ=Nx[:,:,j]
    Ku[i,j] = (NI[mask!=True]*NJ[mask!=True]).sum()

So my question: what's the easiest way to parallelize this code? 所以我的问题是：并行化此代码的最简单方法是什么？

         ---------- EDIT LATER------------------

An example of data 数据示例

import random
import numpy as np
import numpy.ma as ma
from numpy import unravel_index    

#my input
Nx = np.random.rand(5,5,5)  

#mask creation
mask_positions = zip(*np.where((Nx[:,:,0] < 0.4)))
mask_array_positions = np.asarray(mask_positions)
i, j = mask_array_positions.T
mask = np.zeros(Nx[:,:,0].shape, bool)
mask[i,j] = True

And i want to calculate Ku by parallelizing. 我想通过并行计算Ku。 My aim is to use the Ku array to solve a linear problem so i have to put the mask values apart (represent near the half of my array) 我的目的是使用Ku数组解决线性问题，因此我必须将遮罩值分开（代表数组的一半）

Answer 1

I think you want to 'vectorize', to use numpy terminology, not parallelize in the multiprocess way. 我认为您想要“向量化”，使用numpy术语，而不是以多进程方式并行化。

Your calculation is essentially a dot (matrix) product. 您的计算本质上是一个点（矩阵）乘积。 Apply the mask once to the whole array to get a 2d array, NIJ . 将mask应用到整个阵列一次，以获得二维阵列NIJ 。 Its shape will be (N,5) , where N is the number of True values in ~mask . 它的形状将是(N,5)其中N是数量True中值~mask 。 Then it's just a (5,N) array 'dotted' with a (N,5) - ie. 然后，它只是(5,N)数组的“点”并带有(N,5) -即。 sum over the N dimension, leaving you with a (5,5) array. 在N维上求和，剩下一个(5,5)数组。

NIJ = Nx[~mask,:]
Ku = np.dot(NIJ.T,NIJ)

In quick tests it matches the Ku produced by your double loop. 在快速测试中，它与您的双循环生成的Ku相匹配。 Depending on the underlying library used for np.dot there might be some multicore calculation, but that's usually not a priority issue for numpy users. 取决于用于np.dot的基础库，可能会进行一些多核计算，但这通常不是numpy用户的优先事项。

Applying the large boolean mask is the most time consuming part of these calculations - both the vectorized and iterative versions. 应用大布尔mask是这些计算中最耗时的部分-矢量化和迭代版本。

For a mask with 400,000 True values, compare these 2 indexing times: 对于具有400,000个True值的mask ，请比较以下两个索引时间：

In [195]: timeit (NI[:400,:1000],NJ[:400,:1000])
100000 loops, best of 3: 4.87 us per loop
In [196]: timeit (NI[mask],NJ[mask])
10 loops, best of 3: 98.8 ms per loop

Selecting the same number of items with basic (slice) indexing is several orders of magnitude faster than advanced indexing with the mask . 使用基本（切片）索引选择相同数量的项目比使用mask高级索引要快几个数量级。

Substituting np.dot(NI[mask],NJ[mask]) for (NI[mask]*NJ[mask]).sum() only saves a few ms. 将np.dot(NI[mask],NJ[mask])替换为(NI[mask]*NJ[mask]).sum()仅节省几毫秒。

Answer 2

I'd like to extend @hpaulj's excellent answer (great analysis of the problem too by the way) for large matrices. 我想对大型矩阵扩展@hpaulj的出色答案（顺便说一下，也是对问题的出色分析）。

The operation 手术

Ku = np.dot(NIJ.T,NIJ)

can be replaced by 可以替换为

Ku = np.einsum('ij,ik->jk', NIJ, NIJ)

It should also be noted that np.dot could fall back to slower routines if numpy was not compiled to use BLAS. 还应注意，如果未将numpy编译为使用BLAS，则np.dot 可能会退回到较慢的例程。

For a test matrix NIJ of shape (1250711, 50) , I got 54.9 s with the dot method, while the einsum does it in 1.67 s . 对于形状为(1250711, 50)的测试矩阵NIJ ，我用dot法得到了54.9 s ，而einsum在1.67 s 。 On my system, numpy is compiled with BLAS support. 在我的系统上，numpy是在BLAS支持下编译的。

Remark : np.einsum does not always outperform np.dot , a situation that became apparent on my system when you would compare any of the following 备注： np.einsum并不总是优于np.dot ，当您比较以下任何一项时，这种情况在我的系统上变得很明显

Nx = np.random.rand(1400,1528,20).astype(np.float16)
Nx = np.random.rand(1400,1528,20).astype(np.float32)

(or even a dtype of np.float64 ). （甚至是np.float64 ）。

Python-为2D蒙版数组并行化python循环？

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-02-27 21:26:08

解决方案2
1 2015-02-27 22:48:31

Python-为2D蒙版数组并行化python循环？

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-02-27 21:26:08

解决方案2 1 2015-02-27 22:48:31

解决方案1
3 已采纳 2015-02-27 21:26:08

解决方案2
1 2015-02-27 22:48:31