简体   繁体   English

掩码基于索引的numpy数组

[英]Mask numpy array based on index

How do I mask an array based on the actual index values? 如何根据实际索引值屏蔽数组?

That is, if I have a 10 x 10 x 30 matrix and I want to mask the array when the first and second index equal each other. 也就是说,如果我有一个10 x 10 x 30的矩阵,我想在第一个和第二个索引相等时屏蔽数组。

For example, [1, 1 , :] should be masked because 1 and 1 equal each other but [1, 2, :] should not because they do not. 例如, [1, 1 , :]应该被屏蔽,因为1和1彼此相等,但[1, 2, :]不应该,因为它们不相同。

I'm only asking this with the third dimension because it resembles my current problem and may complicate things. 我只是问第三个维度,因为它类似于我当前的问题,可能会使事情复杂化。 But my main question is, how to mask arrays based on the value of the indices? 但我的主要问题是,如何根据索引的值掩盖数组?

In general, to access the value of the indices, you can use np.meshgrid : 通常,要访问索引的值,可以使用np.meshgrid

i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
m.mask = (i == j)

The advantage of this method is that it works for arbitrary boolean functions on i , j , and k . 这种方法的优点是它适用于ijk上的任意布尔函数。 It is a bit slower than the use of the identity special case. 它比使用identity特殊情况慢一点。

In [56]: %%timeit
   ....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
   ....: i == j
10000 loops, best of 3: 96.8 µs per loop

As @Jaime points out, meshgrid supports a sparse option, which doesn't do so much duplication, but requires a bit more care in some cases because they don't broadcast. 正如@Jaime指出的那样, meshgrid支持一个sparse选项,它没有那么多重复,但在某些情况下需要更多关注,因为它们不进行广播。 It will save memory and speed things up a little. 它可以节省内存并加快速度。 For example, 例如,

In [77]: x = np.arange(5)

In [78]: np.meshgrid(x, x)
Out[78]: 
[array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]]),
 array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])]

In [79]: np.meshgrid(x, x, sparse=True)
Out[79]: 
[array([[0, 1, 2, 3, 4]]),
 array([[0],
       [1],
       [2],
       [3],
       [4]])]

So, you can use the sparse version as he says, but you must force the broadcasting as such: 因此,您可以使用sparse版本,但您必须强制广播:

i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
m.mask = np.repeat(i==j, k.size, axis=2)

And the speedup: 加速:

In [84]: %%timeit
   ....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
   ....: np.repeat(i==j, k.size, axis=2)
10000 loops, best of 3: 73.9 µs per loop

In your special case of wanting to mask the diagonals, you can use the np.identity() function which returns ones along the diagonal. 在您想要屏蔽对角线的特殊情况下,您可以使用np.identity()函数返回沿对角线的函数。 Since you have the third dimension, we have to add that third dimension to the the identity matrix: 由于您有第三个维度,我们必须将第三个维度添加到单位矩阵:

m.mask = np.identity(10)[...,None]*np.ones((1,1,30))

There might be a better way of constructing that array, but it is basically stacking 30 of the np.identity(10) array. 可能有一种更好的方法来构造该数组,但它基本上堆叠了30个np.identity(10)数组。 For example, this is equivalent: 例如,这相当于:

np.dstack((np.identity(10),)*30)

but slower: 但速度较慢:

In [30]: timeit np.identity(10)[...,None]*np.ones((1,1,30))
10000 loops, best of 3: 40.7 µs per loop

In [31]: timeit np.dstack((np.identity(10),)*30)
1000 loops, best of 3: 219 µs per loop

And @Ophion's suggestions 和@ Ophion的建议

In [33]: timeit np.tile(np.identity(10)[...,None], 30)
10000 loops, best of 3: 63.2 µs per loop

In [71]: timeit np.repeat(np.identity(10)[...,None], 30)
10000 loops, best of 3: 45.3 µs per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM