简体   繁体   English

Numpy多维数组索引交换轴顺序

[英]Numpy multi-dimensional array indexing swaps axis order

I am working with multi-dimensional Numpy arrays. 我正在使用多维Numpy数组。 I have noticed some inconsistent behavior when accessing these arrays with other index arrays. 我注意到在使用其他索引数组访问这些数组时会出现一些不一致的行为。 For example: 例如:

import numpy as np
start = np.zeros((7,5,3))
a     = start[:,:,np.arange(2)]
b     = start[0,:,np.arange(2)]
c     = start[0,:,:2]
print 'a:', a.shape
print 'b:', b.shape
print 'c:', c.shape

In this example, I get the result: 在这个例子中,我得到了结果:

a: (7, 5, 2)
b: (2, 5)
c: (5, 2)

This confuses me. 这让我很困惑。 Why do "b" and "c" not have the same dimensions? 为什么“b”和“c”的尺寸不一样? Why does "b" swap the axis order, but not "a"? 为什么“b”交换轴顺序,而不是“a”?

I have been able to design my code around these inconsistencies thanks to lots of unit tests, but understanding what is going on would be appreciated. 由于大量的单元测试,我能够围绕这些不一致的方式设计我的代码,但是理解正在发生的事情将会受到赞赏。

For reference, I am using Python 2.7.3, and Numpy 1.6.2 via MacPorts. 作为参考,我使用Python 2.7.3和Numpy 1.6.2通过MacPorts。

Syntactically, this looks like an inconsistency, but semantically, you're doing two very different things here. 从语法上讲,这看起来像是一种不一致,但从语义上讲,你在这里做了两件截然不同的事情。 In your definition of a and b , you're doing advanced indexing , sometimes called fancy indexing , which returns a copy of the data. ab定义中,您正在进行高级索引 ,有时称为花式索引 ,它返回数据的副本。 In your definition of c , you're doing basic slicing , which returns a view of the data. c的定义中,您正在进行基本切片 ,它返回数据视图。

To tell the difference, it helps to understand how indices are passed to python objects. 为了区分它,有助于理解索引如何传递给python对象。 Here are some examples: 这里有些例子:

>>> class ShowIndex(object):
...     def __getitem__(self, index):
...         print index
... 
>>> ShowIndex()[:,:]
(slice(None, None, None), slice(None, None, None))
>>> ShowIndex()[...,:]
(Ellipsis, slice(None, None, None))
>>> ShowIndex()[0:5:2,::-1]
(slice(0, 5, 2), slice(None, None, -1))
>>> ShowIndex()[0:5:2,np.arange(3)]
(slice(0, 5, 2), array([0, 1, 2]))
>>> ShowIndex()[0:5:2]
slice(0, 5, 2)
>>> ShowIndex()[5, 5]
(5, 5)
>>> ShowIndex()[5]
5
>>> ShowIndex()[np.arange(3)]
[0 1 2]

As you can see, there are many different possible configurations. 如您所见,有许多不同的可能配置。 First, individual items may be passed, or tuples of items may be passed. 首先,可以传递单个项目,或者可以传递项目元组。 Second, the tuples may contain slice objects, Ellipsis objects, plain integers, or numpy arrays. 其次,元组可能包含slice对象, Ellipsis对象,普通整数或numpy数组。

Basic slicing is activated when you pass only objects like int , slice , or Ellipsis objects, or None (which is the same as numpy.newaxis ). 传递intsliceEllipsis对象或None (与numpy.newaxis相同)等对象时,将激活基本切片。 These can be passed singly or in a tuple. 这些可以单独传递或者在元组中传递。 Here's what the docs have to say about how basic slicing is activated: 以下是文档关于如何激活基本切片的说法:

Basic slicing occurs when obj is a slice object (constructed by start:stop:step notation inside of brackets), an integer, or a tuple of slice objects and integers. 当obj是切片对象(由start:stop:括号内的步骤符号构造),整数或切片对象和整数的元组时,会发生基本切片。 Ellipsis and newaxis objects can be interspersed with these as well. 省略号和newaxis对象也可以穿插其中。 In order to remain backward compatible with a common usage in Numeric, basic slicing is also initiated if the selection object is any sequence (such as a list) containing slice objects, the Ellipsis object, or the newaxis object, but no integer arrays or other embedded sequences. 为了保持向后兼容Numeric中的常见用法,如果选择对象是包含切片对象,Ellipsis对象或newaxis对象但没有整数数组或其他的任何序列(例如列表),也会启动基本切片。嵌入序列。

Advanced indexing is activated when you pass a numpy array, a non-tuple sequence containing only integers or containing subsequences of any kind, or a tuple containing an array or subsequence. 传递numpy数组,仅包含整数或包含任何类型子序列的非元组序列或包含数组或子序列的元组时,将激活高级索引。

For details on how advanced indexing and basic slicing differ, see the docs (linked to above). 有关高级索引和基本切片如何不同的详细信息,请参阅文档(链接到上面)。 But in this particular case, it's clear to me what's happening. 但在这种特殊情况下,我很清楚发生了什么。 It has to do with the following behavior when using partial indexing: 使用部分索引时,它与以下行为有关:

The rule for partial indexing is that the shape of the result (or the interpreted shape of the object to be used in setting) is the shape of x with the indexed subspace replaced with the broadcasted indexing subspace. 部分索引的规则是结果的形状(或者在设置中使用的对象的解释形状)是x的形状,其中索引子空间被广播的索引子空间替换。 If the index subspaces are right next to each other, then the broadcasted indexing space directly replaces all of the indexed subspaces in x. 如果索引子空间彼此相邻,则广播的索引空间直接替换x中的所有索引子空间。 If the indexing subspaces are separated (by slice objects), then the broadcasted indexing space is first, followed by the sliced subspace of x. 如果索引子空间是分开的(通过切片对象),则首先是广播的索引空间,然后是x的切片子空间。

In your definition of a , which uses advanced indexing, you effectively pass the sequence [0, 1] in as the third item of the tuple, and since no broadcasting happens (because there is no other sequence), everything happens as expected. 在您使用高级索引的a的定义中,您有效地将序列[0, 1]作为元组的第三项传递,并且由于没有广播发生(因为没有其他序列),所有内容都按预期发生。

In your definition of b , also using advanced indexing, you effectively pass two sequences, [0] , the first item (which is converted into an intp array), and [0, 1] , the third item. b的定义中,也使用高级索引,您可以有效地传递两个序列[0] ,第一个项目(转换为intp数组)和[0, 1] ,第三个项目。 These two items are broadcast together, and the result has the same shape as the third item. 这两个项目一起广播,结果与第三个项目具有相同的形状。 However, since broadcasting has happened, we're faced with a problem: where in the new shape tuple do we insert the broadcasted shape? 然而,由于广播已经发生,我们面临一个问题:在新的形状元组中我们插入广播的形状? As the docs say, 正如文档所说,

there is no unambiguous place to drop in the indexing subspace, thus it is tacked-on to the beginning. 没有明确的地方可以放入索引子空间,因此它会被添加到开头。

So the 2 that results from broadcasting is moved to the beginning of the shape tuple, producing an apparent transposition. 因此,广播产生的2被移动到形状元组的开头,产生明显的转置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM