简体   繁体   English

numpy ndarray 的身份掩码

[英]Identity mask for numpy ndarray

I would have expected True to preserve a ndarray when used as a mask, however, it adds a dimension, just like None .我本来希望True在用作掩码时保留一个ndarray ,但是,它增加了一个维度,就像None一样。

arr = np.arange(16).reshape(2, 4, 2)
np.all(arr[True] == arr)         # outputs: True

Close enough, however looking closer:足够接近,但仔细观察:

arr[True].shape                  # outputs: (1, 2, 4, 2)
arr[None].shape                  # outputs: (1, 2, 4, 2)

I found two ways to set an identity mask: using slice(None) or Ellipsis .我找到了两种设置身份掩码的方法:使用slice(None)Ellipsis

np.all(arr[slice(None)] == arr)  # outputs: True
arr[slice(None)].shape           # outputs: (2, 4, 2)

np.all(Ellipsis == arr)          # outputs: True
arr[Ellipsis].shape              # outputs: (2, 4, 2)

Nothing really surprising here as this is how slicing works in the first place.这里没有什么特别令人惊讶的,因为这就是切片的工作原理。 slice(None) is a tad ugly and Ellipsis seems a wee bit faster. slice(None)有点难看,而Ellipsis似乎要快一点。
However, going through:然而,经过:

I am not sure I fully understand this:我不确定我是否完全理解这一点:

Deprecated since version 1.15.0: In order to remain backward compatible with a common usage in Numeric, basic slicing is also initiated if the selection object is any non-ndarray and non-tuple sequence (such as a list) containing slice objects, the Ellipsis object, or the newaxis object, but not for integer arrays or other embedded sequences. 1.15.0 版后已弃用:为了与 Numeric 中的常见用法保持向后兼容,如果选择对象是包含切片对象的任何非 ndarray 和非元组序列(例如列表),则还启动基本切片, Ellipsis 对象或 newaxis 对象,但不适用于整数数组或其他嵌入序列。

I understand that the best way to preserve an array is not to mask it, but say I really want to setup a default value for a mask... ;-)我知道保留数组的最佳方法不是屏蔽它,而是说我真的想为掩码设置默认值... ;-)

Question: Which is the preferred way to setup an identity mask ?问题:设置身份掩码的首选方法是什么? And if I may, is True adding a dimension the intended behavior ?如果可以的话, True添加一个维度是预期的行为吗?

For a sample 2d array:对于示例二维数组:

In [172]: x=np.array([[1,2],[4,3]])
In [173]: x.__array_interface__
Out[173]: 
{'data': (50806320, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (2, 2),
 'version': 3}

A view with ellipsis:带省略号的视图:

In [174]: x[...].__array_interface__
Out[174]: 
{'data': (50806320, False),          # same as for x
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (2, 2),
 'version': 3}

A view with an added dimension:添加维度的视图:

In [175]: x[None].__array_interface__
Out[175]: 
{'data': (50806320, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (1, 2, 2),
 'version': 3}

A copy with an added dimension - note the change data address.具有附加维度的副本 - 请注意更改数据地址。 Advanced indexing.高级索引。

In [176]: x[True].__array_interface__
Out[176]: 
{'data': (50796640, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (1, 2, 2),
 'version': 3}

Another copy with a size 0 dimension.另一个尺寸为 0 尺寸的副本。 It's reusing memory.这是重用内存。

In [177]: x[False].__array_interface__
Out[177]: 
{'data': (50796640, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (0, 2, 2),
 'version': 3}

The only applicable reference in the indexing page that I can find is:我能找到的indexing页面中唯一适用的参考是:

https://numpy.org/doc/stable/reference/arrays.indexing.html#detailed-notes https://numpy.org/doc/stable/reference/arrays.indexing.html#detailed-notes

the nonzero equivalence for Boolean arrays does not hold for zero dimensional boolean arrays.布尔数组的非零等价性不适用于零维布尔数组。

I wouldn't be surprised if this behavior was a left over from some past implementation.如果这种行为是过去的一些实现遗留下来的,我不会感到惊讶。 Due a history of merging several numeric packages, there are some rough edges.由于合并多个数字包的历史,有一些粗糙的边缘。 Some of those have been, or are in the process of, deprecation.其中一些已经或正在被弃用。

A scalar boolean index is a zero dimensional boolean array :标量布尔索引是一个zero dimensional boolean array

In [178]: np.array(True).shape
Out[178]: ()

We can add the new dimension else where:我们可以在其他地方添加新维度:

In [181]: x[:,True].shape
Out[181]: (2, 1, 2)
In [183]: x[...,False].shape
Out[183]: (2, 2, 0)

You keep saying "mask", but it doesn't sound like you really want a masking operation at all, even an "identity" mask.你一直说“掩码”,但听起来你根本不需要掩码操作,甚至是“身份”掩码。 A mask array would typically be a boolean array of the same shape as the original array, and indexing with the mask would produce a 1D array with items selected by the mask.掩码数组通常是与原始数组形状相同的布尔数组,使用掩码进行索引将生成一个一维数组,其中包含由掩码选择的项目。 Even an all-true mask would produce a flattened copy of the array it was applied to.即使是全真掩码也会产生它所应用到的数组的扁平副本。 It wouldn't be an identity operation.这不会是身份操作。 It's possible to do weirder things with masks, but not an identity operation.使用掩码可以做更奇怪的事情,但不能做身份操作。

If you want an indexer that outputs an equivalent array to the original, the typical, most general way to do that would be ... - a literal ellipsis:如果您想要一个索引器输出与原始数组等效的数组,那么典型的、最通用的方法是... - 文字省略号:

arr[...]

Unlike : , this also works for 0-dimensional arrays.:不同,这也适用于 0 维数组。 Note that this produces a view, not a copy.请注意,这会生成视图,而不是副本。 There is no indexer that would produce a copy and work properly for all input dimensions.没有索引器可以为所有输入维度生成副本并正常工作。


arr[True] works like it does primarily out of a desire to have 0-dimensional arrays follow the same boolean indexing rules as positive-dimensional arrays. arr[True]工作方式主要是为了让 0 维数组遵循与正维数组相同的布尔索引规则。 As mentioned above, if you index an n-dimensional array with an n-dimensional mask, the result is a 1-dimensional array.如上所述,如果使用 n 维掩码索引 n 维数组,则结果是一维数组。 If you index a 0-dimensional array with a 0-dimensional mask, the result is again a 1-dimensional array:如果用 0 维掩码索引一个 0 维数组,结果又是一个 1 维数组:

In [1]: import numpy

In [2]: x = numpy.array([[1, 2], [3, 4]])

In [3]: x[x % 2 == 0]
Out[3]: array([2, 4])

In [4]: y = numpy.array([1, 2, 3, 4])

In [5]: y[y % 2 == 0]
Out[5]: array([2, 4])

In [6]: z = numpy.array(5) # 0-dimensional!

In [7]: z[z % 2 == 0]
Out[7]: array([], dtype=int64)

In [8]: z[z % 2 == 1]
Out[8]: array([5])

Indexing a 0-dimensional array with a 0-dimensional mask increases the dimensionality by 1. Generalized to higher dimensions, indexing an n-dimensional array with a 0-dimensional mask produces an n+1-dimensional array.使用 0 维掩码索引 0 维数组会将维数增加 1。推广到更高维度,使用 0 维掩码索引 n 维数组会生成 n+1 维数组。 If the mask is True, the extra dimension has length 1;如果掩码为 True,则额外维度的长度为 1; if the mask is False, the extra dimension has length 0, and the output has no elements.如果掩码为 False,则额外维度的长度为 0,并且输出没有元素。 This generalized behavior is rarely useful, but it's what fits best with the (rarely useful) rules for applying a positive-dimension mask to an array with mismatching dimensions.这种广义行为很少有用,但它最适合(很少有用)规则,将正维度掩码应用于维度不匹配的数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM