简体   繁体   English

Python-为2D蒙版数组并行化python循环?

[英]Python - parallelize a python loop for 2D masked array?

Probably a commonplace question, but how can I parallelize this loop in Python? 可能是一个平常的问题,但是如何在Python中并行化此循环?

for i in range(0,Nx.shape[2]):
  for j in range(0,Nx.shape[2]):
    NI=Nx[:,:,i]; NJ=Nx[:,:,j]
    Ku[i,j] = (NI[mask!=True]*NJ[mask!=True]).sum()

So my question: what's the easiest way to parallelize this code? 所以我的问题是:并行化此代码的最简单方法是什么?

         ---------- EDIT LATER------------------

An example of data 数据示例

import random
import numpy as np
import numpy.ma as ma
from numpy import unravel_index    

#my input
Nx = np.random.rand(5,5,5)  

#mask creation
mask_positions = zip(*np.where((Nx[:,:,0] < 0.4)))
mask_array_positions = np.asarray(mask_positions)
i, j = mask_array_positions.T
mask = np.zeros(Nx[:,:,0].shape, bool)
mask[i,j] = True

And i want to calculate Ku by parallelizing. 我想通过并行计算Ku。 My aim is to use the Ku array to solve a linear problem so i have to put the mask values apart (represent near the half of my array) 我的目的是使用Ku数组解决线性问题,因此我必须将遮罩值分开(代表数组的一半)

I think you want to 'vectorize', to use numpy terminology, not parallelize in the multiprocess way. 我认为您想要“向量化”,使用numpy术语,而不是以多进程方式并行化。

Your calculation is essentially a dot (matrix) product. 您的计算本质上是一个点(矩阵)乘积。 Apply the mask once to the whole array to get a 2d array, NIJ . mask应用到整个阵列一次,以获得二维阵列NIJ Its shape will be (N,5) , where N is the number of True values in ~mask . 它的形状将是(N,5)其中N是数量True中值~mask Then it's just a (5,N) array 'dotted' with a (N,5) - ie. 然后,它只是(5,N)数组的“点”并带有(N,5) -即。 sum over the N dimension, leaving you with a (5,5) array. N维上求和,剩下一个(5,5)数组。

NIJ = Nx[~mask,:]
Ku = np.dot(NIJ.T,NIJ)

In quick tests it matches the Ku produced by your double loop. 在快速测试中,它与您的双循环生成的Ku相匹配。 Depending on the underlying library used for np.dot there might be some multicore calculation, but that's usually not a priority issue for numpy users. 取决于用于np.dot的基础库,可能会进行一些多核计算,但这通常不是numpy用户的优先事项。


Applying the large boolean mask is the most time consuming part of these calculations - both the vectorized and iterative versions. 应用大布尔mask是这些计算中最耗时的部分-矢量化和迭代版本。

For a mask with 400,000 True values, compare these 2 indexing times: 对于具有400,000个True值的mask ,请比较以下两个索引时间:

In [195]: timeit (NI[:400,:1000],NJ[:400,:1000])
100000 loops, best of 3: 4.87 us per loop
In [196]: timeit (NI[mask],NJ[mask])
10 loops, best of 3: 98.8 ms per loop

Selecting the same number of items with basic (slice) indexing is several orders of magnitude faster than advanced indexing with the mask . 使用基本(切片)索引选择相同数量的项目比使用mask高级索引要快几个数量级。

Substituting np.dot(NI[mask],NJ[mask]) for (NI[mask]*NJ[mask]).sum() only saves a few ms. np.dot(NI[mask],NJ[mask])替换为(NI[mask]*NJ[mask]).sum()仅节省几毫秒。

I'd like to extend @hpaulj's excellent answer (great analysis of the problem too by the way) for large matrices. 我想对大型矩阵扩展@hpaulj的出色答案(顺便说一下,也是对问题的出色分析)。

The operation 手术

Ku = np.dot(NIJ.T,NIJ)

can be replaced by 可以替换为

Ku = np.einsum('ij,ik->jk', NIJ, NIJ)

It should also be noted that np.dot could fall back to slower routines if numpy was not compiled to use BLAS. 还应注意,如果未将numpy编译为使用BLAS,则np.dot 可能会退回到较慢的例程

For a test matrix NIJ of shape (1250711, 50) , I got 54.9 s with the dot method, while the einsum does it in 1.67 s . 对于形状为(1250711, 50)的测试矩阵NIJ ,我用dot法得到了54.9 s ,而einsum1.67 s On my system, numpy is compiled with BLAS support. 在我的系统上,numpy是在BLAS支持下编译的。

Remark : np.einsum does not always outperform np.dot , a situation that became apparent on my system when you would compare any of the following 备注np.einsum并不总是优于np.dot ,当您比较以下任何一项时,这种情况在我的系统上变得很明显

Nx = np.random.rand(1400,1528,20).astype(np.float16)
Nx = np.random.rand(1400,1528,20).astype(np.float32)

(or even a dtype of np.float64 ). (甚至是np.float64 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM