简体   繁体   English

具有多个布尔数组的Numpy多维切片

[英]Numpy multi-dimensional slicing with multiple boolean arrays

I'm trying to use individual 1-dimensional boolean arrays to slice a multi-dimension array. 我正在尝试使用单独的1维布尔数组来切割多维数组。 For some reason, this code doesn't work: 出于某种原因,此代码不起作用:

>>> a = np.ones((100, 200, 300, 2))
>>> a.shape
(100, 200, 300, 2)
>>> m1 = np.asarray([True]*200)
>>> m2 = np.asarray([True]*300)
>>> m2[-1] = False
>>> a[:,m1,m2,:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (200,) (299,) 
>>> m2 = np.asarray([True]*300) # try again with all 300 dimensions True
>>> a[:,m1,m2,:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (200,) (300,) 

But this works just fine: 但这很好用:

>>> a = np.asarray([[[1, 2], [3, 4], [5, 6]], [[11, 12], [13, 14], [15, 16]]])
>>> a.shape
(2, 3, 2)
>>> m1 = np.asarray([True, False, True])
>>> m2 = np.asarray([True, False])
>>> a[:,m1,m2]
array([[ 1,  5],
       [11, 15]])

Any idea of what I might be doing wrong in the first example? 知道我在第一个例子中可能做错了什么吗?

Short answer: The number of True elements in m1 and m2 must match, unless one of them has only one True term. 简答: m1m2的True元素数必须匹配,除非其中一个只有一个True项。

Also distinguish between 'diagonal' indexing and 'rectangular' indexing. 还要区分“对角线”索引和“矩形”索引。 This is about indexing, not slicing. 这是关于索引,而不是切片。 The dimensions with : are just along for the ride. 与尺寸:只是凑凑热闹。

Initial ideas 初步想法

I can get your first case working with: 我可以让你的第一个案例与:

In [137]: a=np.ones((100,200,300,2))

In [138]: m1=np.ones((200,),bool)    
In [139]: m2=np.ones((300,),bool)
In [140]: m2[-1]=False

In [141]: I,J=np.ix_(m1,m2)

In [142]: a[:,I,J,:].shape
Out[142]: (100, 200, 299, 2)

np.ix_ turns the 2 boolean arrays into broadcastable index arrays np.ix_将2个布尔数组转换为可广播的索引数组

In [143]: I.shape
Out[143]: (200, 1)
In [144]: J.shape
Out[144]: (1, 299)

Note that this picks 200 'rows' in one dimension, and 299 in the other. 请注意,这会在一个维度中选择200个“行”,而在另一个维度中选择299个。

I'm not sure why this kind of reworking of the arrays is needed in this case, but not in the 2nd 我不确定为什么在这种情况下需要对数组进行这种重写,而不是在第二种情况下

In [154]: b=np.arange(2*3*2).reshape((2,3,2))

In [155]: n1=np.array([True,False,True])
In [156]: n2=np.array([True,False])

In [157]: b[:,n1,n2]
Out[157]: 
array([[ 0,  4],      # shape (2,2)
       [ 6, 10]])

Taking the same ix_ strategy produces the same values but a different shape: 采用相同的ix_策略会产生相同的值但形状不同:

In [164]: b[np.ix_(np.arange(b.shape[0]),n1,n2)]
# or I,J=np.ix_(n1,n2);b[:,I,J]
Out[164]: 
array([[[ 0],
        [ 4]],

       [[ 6],
        [10]]])

In [165]: _.shape
Out[165]: (2, 2, 1)

Both cases use all rows of the 1st dimension. 两种情况都使用第一维的所有行。 The ix one picks 2 'rows' of the 2nd dim, and 1 column of the last, resulting the (2,2,1) shape. ix选择第二个暗淡的2'行和最后一个的1列,得到(2,2,1)形状。 The other picks b[:,0,0] and b[0,2,0] terms, resulting (2,2) shape. 另一个选择b[:,0,0]b[0,2,0]项,得到(2,2)形状。 (see my addenda as to why both are simply broadcasting). (参见我的补遗,为什么两者都只是广播)。 These are all cases of advanced indexing, with boolean and numeric indexes. 这些都是高级索引的情况,包括布尔和数字索引。 One can study the docs, or one can play around. 人们可以研究文档,或者可以玩一下。 Sometimes it's more fun to do the later. 有时候做这件事会更有趣。 :) :)

(I knew that ix_ was good for adding the necessary np.newaxis to arrays so can be broadcast together, but didn't realize that worked with boolean arrays as well - it uses np.nonzero() to convert boolean to indices.) (我知道ix_很适合在数组中添加必要的np.newaxis ,因此可以一起广播,但是没有意识到它也适用于布尔数组 - 它使用np.nonzero()将布尔值转换为索引。)

Resolution 解析度

Underlying this is, I think, a confusion over 2 modes of indexing. 我认为,这是对两种索引模式的混淆。 which might called 'diagonal' and 'rectangular' (or element-by-element selection versus block selection). 这可能被称为“对角线”和“矩形”(或逐个元素选择与块选择)。 To illustrate look at a small 2d array 为了说明看一个小的2d数组

In [73]: M=np.arange(6).reshape(2,3)
In [74]: M
Out[74]: 
array([[0, 1, 2],
       [3, 4, 5]])

and 2 simple numeric indexes 和2个简单的数字索引

In [75]: m1=np.arange(2); m2=np.arange(2)

They can be used 2 ways: 它们可以使用2种方式:

In [76]: M[m1,m2]
Out[76]: array([0, 4])

and

In [77]: M[m1[:,None],m2]
Out[77]: 
array([[0, 1],
       [3, 4]])

The 1st picks 2 points, the M[0,0] and M[1,1] . 第一个选择2个点, M[0,0]M[1,1] This kind of indexing lets us pick out the diagonals of an array. 这种索引让我们挑选出数组的对角线。

The 2nd picks 2 rows and from that 2 columns. 第二个选择2行,从那2列。 This is the kind of indexing the np.ix_ produces. 这是np.ix_产生的索引np.ix_ The 1st picks 2 points, the M[0,0] and M[1,1] . 第一个选择2个点, M[0,0]M[1,1] This a 'rectangular' form of indexing. 这是一种“矩形”索引形式。

Change m2 to 3 values: m2更改为3个值:

In [78]: m2=np.arange(3)
In [79]: M[m1[:,None],m2]   # returns a 2x3
Out[79]: 
array([[0, 1, 2],
       [3, 4, 5]])
In [80]: M[m1,m2]   # produces an error
...
ValueError: shape mismatch: objects cannot be broadcast to a single shape

But if m2 has just one element, we don't get the broadcast error - because the size 1 dimension can be expanded during broadcasting: 但是如果m2只有一个元素,我们就不会得到广播错误 - 因为在广播期间可以扩展尺寸1维度:

In [81]: m2=np.arange(1)
In [82]: M[m1,m2]
Out[82]: array([0, 3])

Now change the index arrays to boolean, each matching the length of the respective dimensions, 2 and 3. 现在将索引数组更改为boolean,每个数组都匹配相应维度的长度2和3。

In [91]: m1=np.ones(2,bool); m2=np.ones(3,bool)
In [92]: M[m1,m2]
...
ValueError: shape mismatch: objects cannot be broadcast to a single shape
In [93]: m2[2]=False  # m1 and m2 each have 2 True elements
In [94]: M[m1,m2]
Out[94]: array([0, 4])
In [95]: m2[0]=False   # m2 has 1 True element
In [96]: M[m1,m2]
Out[96]: array([1, 4])

With 2 and 3 True terms we get an error, but with 2 and 2 or 2 and 1 it runs - just as though we'd used the indices of the True elements: np.nonzero(m2) . 使用2和3个真项,我们得到一个错误,但是运行2和2或2和1 - 就好像我们使用了True元素的索引: np.nonzero(m2)

To apply this to your examples. 将此应用于您的示例。 In the first, m1 and m2 have 200 and 299 True elements. 在第一个中, m1m2有200和299个True元素。 a[:,m1,m2,:] fails because of a mismatch in the number of True terms. a[:,m1,m2,:]失败,因为True术语的数量不匹配。

In the 2nd, they have 2 and 1 True terms, with nonzero indices of [0,2] and [0] , which can be broadcast to [0,0] . 在第二个中,它们有2个和1个True项,非零索引为[0,2][0] ,可以广播到[0,0] So it runs. 所以它运行。

http://docs.scipy.org/doc/numpy-1.10.0/reference/arrays.indexing.html explains boolean array indexing in terms of nonzero and ix_ . http://docs.scipy.org/doc/numpy-1.10.0/reference/arrays.indexing.html解释了nonzeroix_布尔数组索引。

Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. 使用obj.nonzero()类比可以最好地理解组合多个布尔索引数组或布尔与整数索引数组。 The function ix_ also supports boolean arrays and will work without any surprises. 函数ix_也支持布尔数组,并且可以毫无意外地工作。

Addenda 附加物

On further thought the distinction between 'diagonal' and 'block/rectangular' indexing might be more my mental construct that numpys . 进一步思考“对角线”和“块/矩形”索引之间的区别可能更多是我的心理构造,即numpys Underlying both is the concept of broadcasting. 两者的基础是广播的概念。

Take the n1 and n2 booleans, and get their nonzero equivalents: n1n2布尔值,得到它们的nonzero等价物:

In [107]: n1
Out[107]: array([ True, False,  True], dtype=bool)
In [108]: np.nonzero(n1)
Out[108]: (array([0, 2], dtype=int32),)
In [109]: n2
Out[109]: array([ True, False], dtype=bool)
In [110]: np.nonzero(n2)
Out[110]: (array([0], dtype=int32),)

Now try broadcasting in the 'diagonal' and 'rectangular' modes: 现在尝试以“对角线”和“矩形”模式进行广播:

In [105]: np.broadcast_arrays(np.array([0,2]),np.array([0]))
Out[105]: [array([0, 2]), 
           array([0, 0])]

In [106]: np.broadcast_arrays(np.array([0,2])[:,None],np.array([0]))
Out[106]: 
[array([[0],
        [2]]), 
 array([[0],
        [0]])]

One produces (2,) arrays, the other (2,1) . 一个产生(2,)数组,另一个产生(2,1)

这可能是一个简单的解决方法:

a[:,m1,:,:][:,:,m2,:]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM