简体   繁体   English

对二维 numpy 数组进行子集化

[英]Subsetting a 2D numpy array

I have looked into documentations and also other questions here, but it seems I have not got the hang of subsetting in numpy arrays yet.我在这里查看了文档和其他问题,但似乎我还没有掌握 numpy 数组中的子集。

I have a numpy array, and for the sake of argument, let it be defined as follows:我有一个 numpy 数组,为了论证,让它定义如下:

import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
#        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
#        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
#        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
#        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
#        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
#        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
#        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
#        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
#        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

now I want to choose rows and columns of a specified by vectors n1 and n2 .现在我想选择的行和列a由向量指定n1n2 As an example:举个例子:

n1 = range(5)
n2 = range(5)

But when I use:但是当我使用:

b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])

Then only the first fifth diagonal elements are chosen, not the whole 5x5 block.然后只选择第一个第五对角线元素,而不是整个 5x5 块。 The solution I have found is to do it like this:我找到的解决方案是这样做:

b = a[n1,:]
b = b[:,n2]
# array([[ 0,  1,  2,  3,  4],
#        [10, 11, 12, 13, 14],
#        [20, 21, 22, 23, 24],
#        [30, 31, 32, 33, 34],
#        [40, 41, 42, 43, 44]])

But I am sure there should be a way to do this simple task in just one command.但我确信应该有一种方法可以在一个命令中完成这个简单的任务。

You've gotten a handful of nice examples of how to do what you want.你已经得到了一些很好的例子来说明如何做你想做的事。 However, it's also useful to understand the what's happening and why things work the way they do.但是,了解正在发生的事情以及事情为什么会这样运作也很有用。 There are a few simple rules that will help you in the future.有一些简单的规则可以在将来对您有所帮助。

There's a big difference between "fancy" indexing (ie using a list/sequence) and "normal" indexing (using a slice). “花式”索引(即使用列表/序列)和“正常”索引(使用切片)之间存在很大差异。 The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made.根本原因与数组是否可以“定期跨步”有关,因此是否需要制作副本。 Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.因此,如果我们希望能够在不制作副本的情况下创建“视图”,则必须区别对待任意序列。

In your case:在你的情况下:

import numpy as np

a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)

# Not what you want
b = a[n1, n2]  # array([ 0, 11, 22, 33, 44])

# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]

# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]

Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.使用 1D 序列进行花式索引基本上等同于将它们压缩在一起并使用结果进行索引。

print "Fancy Indexing:"
print a[n1, n2]

print "Manual indexing:"
for i, j in zip(n1, n2):
    print a[i, j]

However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently.但是,如果您索引的序列与您索引的数组的维度(在本例中为 2D)相匹配,则索引的处理方式不同。 Instead of "zipping the two together", numpy uses the indices like a mask. numpy 不是“将两者压缩在一起”,而是像掩码一样使用索引。

In other words, a[[[1, 2, 3]], [[1],[2],[3]]] is treated completely differently than a[[1, 2, 3], [1, 2, 3]] , because the sequences/arrays that you're passing in are two-dimensional.换句话说, a[[[1, 2, 3]], [[1],[2],[3]]]a[[1, 2, 3], [1, 2, 3]] ,因为您传入的序列/数组是二维的。

In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
Out[4]:
array([[11, 21, 31],
       [12, 22, 32],
       [13, 23, 33]])

In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])

To be a bit more precise,更准确地说,

a[[[1, 2, 3]], [[1],[2],[3]]]

is treated exactly like:完全按照以下方式处理:

i = [[1, 1, 1],
     [2, 2, 2],
     [3, 3, 3]])
j = [[1, 2, 3],
     [1, 2, 3],
     [1, 2, 3]]
a[i, j]

In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.换句话说,输入是否是行/列向量是索引应如何在索引中重复的简写。


np.meshgrid and np.ix_ are just convienent ways to turn your 1D sequences into their 2D versions for indexing: np.meshgridnp.ix_只是将一维序列转换为用于索引的二维版本的np.ix_方法:

In [6]: np.ix_([1, 2, 3], [1, 2, 3])
Out[6]:
(array([[1],
       [2],
       [3]]), array([[1, 2, 3]]))

Similarly (the sparse argument would make it identical to ix_ above):类似地( sparse参数将使其与上面的ix_相同):

In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
Out[7]:
[array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]]),
 array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])]

Another quick way to build the desired index is to use the np.ix_ function:构建所需索引的另一种快速方法是使用np.ix_函数:

>>> a[np.ix_(n1, n2)]
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

This provides a convenient way to construct an open mesh from sequences of indices.这提供了一种从索引序列构建开放网格的便捷方法。

You could use np.meshgrid to give the n1 , n2 arrays the proper shape to perform the desired indexing:您可以使用np.meshgridn1n2数组提供正确的形状以执行所需的索引:

In [104]: a[np.meshgrid(n1,n2, sparse=True, indexing='ij')]
Out[104]: 
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

Or, without meshgrid:或者,没有网格:

In [117]: a[np.array(n1)[:,np.newaxis], np.array(n2)[np.newaxis,:]]
Out[117]: 
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

There is a similar example with an explanation of how this integer array indexing works in the docs.在文档中有一个类似的例子,解释了这个整数数组索引是如何工作的。

See also the Cookbook recipe Picking out rows and columns .另请参阅食谱食谱挑选行和列

A nice Trick I've managed to pull (for lazy people only) Is filter + Transpose + filter.我设法拉出的一个很好的技巧(仅适用于懒惰的人)是过滤器 + 转置 + 过滤器。

a = np.arange(100).reshape(10,10)
subsetA = [1,3,5,7]
a[subsetA].T[subsetA]

array([[11, 31, 51, 71],
       [13, 33, 53, 73],
       [15, 35, 55, 75],
       [17, 37, 57, 77]])

It seems that a use case for your particular question would deal with image manipulation.似乎您的特定问题的用例将处理图像处理。 To the extent that you are using your example to edit numpy arrays arising from images, you can use the Python Imaging Library (PIL).如果您使用示例编辑由图像产生的 numpy 数组,则可以使用 Python 成像库 (PIL)。

# Import Pillow:
from PIL import Image

# Load the original image:
img = Image.open("flowers.jpg")

# Crop the image
img2 = img.crop((0, 0, 5, 5))

The img2 object is a numpy array of the resulting cropped image. img2 对象是生成的裁剪图像的 numpy 数组。

You can read more about image manipulation here with the Pillow package (a user friendly fork on the PIL package):您可以在此处使用Pillow 包(PIL 包上的用户友好分支)阅读有关图像处理的更多信息:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM