[英]Python - parallelize a python loop for 2D masked array?
Probably a commonplace question, but how can I parallelize this loop in Python? 可能是一个平常的问题,但是如何在Python中并行化此循环?
for i in range(0,Nx.shape[2]):
for j in range(0,Nx.shape[2]):
NI=Nx[:,:,i]; NJ=Nx[:,:,j]
Ku[i,j] = (NI[mask!=True]*NJ[mask!=True]).sum()
So my question: what's the easiest way to parallelize this code? 所以我的问题是:并行化此代码的最简单方法是什么?
---------- EDIT LATER------------------
An example of data 数据示例
import random
import numpy as np
import numpy.ma as ma
from numpy import unravel_index
#my input
Nx = np.random.rand(5,5,5)
#mask creation
mask_positions = zip(*np.where((Nx[:,:,0] < 0.4)))
mask_array_positions = np.asarray(mask_positions)
i, j = mask_array_positions.T
mask = np.zeros(Nx[:,:,0].shape, bool)
mask[i,j] = True
And i want to calculate Ku by parallelizing. 我想通过并行计算Ku。 My aim is to use the Ku array to solve a linear problem so i have to put the mask values apart (represent near the half of my array)
我的目的是使用Ku数组解决线性问题,因此我必须将遮罩值分开(代表数组的一半)
I think you want to 'vectorize', to use numpy
terminology, not parallelize in the multiprocess way. 我认为您想要“向量化”,使用
numpy
术语,而不是以多进程方式并行化。
Your calculation is essentially a dot (matrix) product. 您的计算本质上是一个点(矩阵)乘积。 Apply the
mask
once to the whole array to get a 2d array, NIJ
. 将
mask
应用到整个阵列一次,以获得二维阵列NIJ
。 Its shape will be (N,5)
, where N
is the number of True
values in ~mask
. 它的形状将是
(N,5)
其中N
是数量True
中值~mask
。 Then it's just a (5,N)
array 'dotted' with a (N,5)
- ie. 然后,它只是
(5,N)
数组的“点”并带有(N,5)
-即。 sum over the N
dimension, leaving you with a (5,5)
array. 在
N
维上求和,剩下一个(5,5)
数组。
NIJ = Nx[~mask,:]
Ku = np.dot(NIJ.T,NIJ)
In quick tests it matches the Ku
produced by your double loop. 在快速测试中,它与您的双循环生成的
Ku
相匹配。 Depending on the underlying library used for np.dot
there might be some multicore calculation, but that's usually not a priority issue for numpy
users. 取决于用于
np.dot
的基础库,可能会进行一些多核计算,但这通常不是numpy
用户的优先事项。
Applying the large boolean mask
is the most time consuming part of these calculations - both the vectorized and iterative versions. 应用大布尔
mask
是这些计算中最耗时的部分-矢量化和迭代版本。
For a mask
with 400,000 True values, compare these 2 indexing times: 对于具有400,000个True值的
mask
,请比较以下两个索引时间:
In [195]: timeit (NI[:400,:1000],NJ[:400,:1000])
100000 loops, best of 3: 4.87 us per loop
In [196]: timeit (NI[mask],NJ[mask])
10 loops, best of 3: 98.8 ms per loop
Selecting the same number of items with basic (slice) indexing is several orders of magnitude faster than advanced indexing with the mask
. 使用基本(切片)索引选择相同数量的项目比使用
mask
高级索引要快几个数量级。
Substituting np.dot(NI[mask],NJ[mask])
for (NI[mask]*NJ[mask]).sum()
only saves a few ms. 将
np.dot(NI[mask],NJ[mask])
替换为(NI[mask]*NJ[mask]).sum()
仅节省几毫秒。
I'd like to extend @hpaulj's excellent answer (great analysis of the problem too by the way) for large matrices. 我想对大型矩阵扩展@hpaulj的出色答案(顺便说一下,也是对问题的出色分析)。
The operation 手术
Ku = np.dot(NIJ.T,NIJ)
can be replaced by 可以替换为
Ku = np.einsum('ij,ik->jk', NIJ, NIJ)
It should also be noted that np.dot
could fall back to slower routines if numpy was not compiled to use BLAS. 还应注意,如果未将numpy编译为使用BLAS,则
np.dot
可能会退回到较慢的例程 。
For a test matrix NIJ
of shape (1250711, 50)
, I got 54.9 s
with the dot
method, while the einsum
does it in 1.67 s
. 对于形状为
(1250711, 50)
的测试矩阵NIJ
,我用dot
法得到了54.9 s
,而einsum
在1.67 s
。 On my system, numpy is compiled with BLAS support. 在我的系统上,numpy是在BLAS支持下编译的。
Remark : np.einsum
does not always outperform np.dot
, a situation that became apparent on my system when you would compare any of the following 备注 :
np.einsum
并不总是优于np.dot
,当您比较以下任何一项时,这种情况在我的系统上变得很明显
Nx = np.random.rand(1400,1528,20).astype(np.float16)
Nx = np.random.rand(1400,1528,20).astype(np.float32)
(or even a dtype of np.float64
). (甚至是
np.float64
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.