简体   繁体   English

带掩码索引的numpy数组上的向量求和运算

[英]Vectorized sum operation on a numpy array with masked indices

I'm trying to do a vectorized sum operation using a numpy array of masked indices. 我正在尝试使用masked索引的numpy数组进行vectorized sum运算。

So for example, without a mask: 因此,例如,没有面具:

import numpy as np

# data to be used in a vectorized sum operation
data = np.array([[1,0,0,0,0,0],
                 [0,1,0,0,0,0],
                 [0,0,1,0,0,0]])

# data indices i wish to sum together
idx = np.array([[0,1,2],   # sum data rows 0,1 and 2
                [2,1,1]])  # sum data rows 2,1 and 1

# without a mask this is straighforward
print np.sum(data[idx],axis=1)
#[[1 1 1 0 0 0]
# [0 2 1 0 0 0]]

Now with a mask, I can't figure out how to do it without looping over the masked index array: 现在有了一个遮罩,我不知道如何在不遍历遮罩索引数组的情况下做到这一点:

# introduce a mask
mask = np.array([[True,  True, True],  # sum data rows 0,1 and 2
                 [False, True, True]]) # sum data rows 1 and 1 (masking out idx[1,0])

summed = np.zeros((idx.shape[0],data.shape[1]),dtype='int')
for i in xrange(idx.shape[0]):
    summed[i] =  np.sum(data[idx[i][mask[i]]],axis=0)
print summed
#[[1 1 1 0 0 0]
 #[0 2 0 0 0 0]]

QUESTION

Is there a proper way to this type of operation without a loop? 有没有适当的方法可以进行这种没有循环的操作?

You can solve it with np.einsum - 您可以使用np.einsum解决-

v = data[idx]
summed = np.einsum('ijk,ij->ik', v, mask)

Run on given sample - 在给定的样本上运行-

In [43]: v = data[idx]

In [44]: np.einsum('ijk,ij->ik', v, mask)
Out[44]: 
array([[1, 1, 1, 0, 0, 0],
       [0, 2, 0, 0, 0, 0]])

Alternatively, with np.matmul - 或者,使用np.matmul

In [67]: np.matmul(v.swapaxes(1,2), mask[...,None])[...,0]
Out[67]: 
array([[1, 1, 1, 0, 0, 0],
       [0, 2, 0, 0, 0, 0]])

# Put another way
In [80]: np.matmul(mask[:,None,:], v)[:,0]
Out[80]: 
array([[1, 1, 1, 0, 0, 0],
       [0, 2, 0, 0, 0, 0]])

Keeping the loop and improve performance 保持循环并提高性能

If you are not looping enough and there's enough sum-reductions happening per iteration, the iterative operation could be replaced by a matrix-multiplication one. 如果循环不够,每次迭代有足够的总和减少,则可以用矩阵乘法来代替迭代运算。 Hence - 因此-

for i in xrange(idx.shape[0]):
    summed[i] = mask[i].dot(data[idx[i]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM