简体   繁体   English

从 NumPy 数组中选择特定的行和列

[英]Selecting specific rows and columns from NumPy array

I've been going crazy trying to figure out what stupid thing I'm doing wrong here.我一直在发疯,试图弄清楚我在这里做错了什么愚蠢的事情。

I'm using NumPy, and I have specific row indices and specific column indices that I want to select from.我正在使用 NumPy,并且我有要从中选择的特定行索引和特定列索引。 Here's the gist of my problem:这是我的问题的要点:

import numpy as np

a = np.arange(20).reshape((5,4))
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15],
#        [16, 17, 18, 19]])

# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [12, 13, 14, 15]])

# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2,  6, 14])

# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

Why is this happening?为什么会这样? Surely I should be able to select the 1st, 2nd, and 4th rows, and 1st and 3rd columns?我当然应该能够选择第 1、第 2 和第 4 行以及第 1 和第 3 列吗? The result I'm expecting is:我期待的结果是:

a[[0,1,3], [0,2]] => [[0,  2],
                      [4,  6],
                      [12, 14]]

As Toan suggests, a simple hack would be to just select the rows first, and then select the columns over that .至于全胜表明,一个简单的黑客是只选择第一行,然后选择过列。

>>> a[[0,1,3], :]            # Returns the rows you want
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [12, 13, 14, 15]])
>>> a[[0,1,3], :][:, [0,2]]  # Selects the columns you want as well
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

[Edit] The built-in method: np.ix_ [编辑] 内置方法: np.ix_

I recently discovered that numpy gives you an in-built one-liner to doing exactly what @Jaime suggested, but without having to use broadcasting syntax (which suffers from lack of readability).我最近发现 numpy 为您提供了一个内置的单行代码,可以完全按照@Jaime 的建议进行操作,但不必使用广播语法(缺乏可读性)。 From the docs:从文档:

Using ix_ one can quickly construct index arrays that will index the cross product.使用 ix_one 可以快速构造索引数组来索引叉积。 a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]] . a[np.ix_([1,3],[2,5])]返回数组[[a[1,2] a[1,5]], [a[3,2] a[3,5]]]

So you use it like this:所以你像这样使用它:

>>> a = np.arange(20).reshape((5,4))
>>> a[np.ix_([0,1,3], [0,2])]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

And the way it works is that it takes care of aligning arrays the way Jaime suggested, so that broadcasting happens properly:它的工作方式是按照 Jaime 建议的方式处理对齐数组,以便广播正确发生:

>>> np.ix_([0,1,3], [0,2])
(array([[0],
        [1],
        [3]]), array([[0, 2]]))

Also, as MikeC says in a comment, np.ix_ has the advantage of returning a view, which my first (pre-edit) answer did not.此外,正如 MikeC 在评论中所说, np.ix_具有返回视图的优势,而我的第一个(预编辑)答案没有。 This means you can now assign to the indexed array:这意味着您现在可以分配给索引数组:

>>> a[np.ix_([0,1,3], [0,2])] = -1
>>> a    
array([[-1,  1, -1,  3],
       [-1,  5, -1,  7],
       [ 8,  9, 10, 11],
       [-1, 13, -1, 15],
       [16, 17, 18, 19]])

Fancy indexing requires you to provide all indices for each dimension.花式索引要求您为每个维度提供所有索引。 You are providing 3 indices for the first one, and only 2 for the second one, hence the error.您为第一个提供 3 个索引,而为第二个提供 2 个索引,因此出现错误。 You want to do something like this:你想做这样的事情:

>>> a[[[0, 0], [1, 1], [3, 3]], [[0,2], [0,2], [0, 2]]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

That is of course a pain to write, so you can let broadcasting help you:写起来当然很痛苦,所以你可以让广播来帮助你:

>>> a[[[0], [1], [3]], [0, 2]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

This is much simpler to do if you index with arrays, not lists:如果您使用数组而不是列表进行索引,则这样做会简单得多:

>>> row_idx = np.array([0, 1, 3])
>>> col_idx = np.array([0, 2])
>>> a[row_idx[:, None], col_idx]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

USE:用:

 >>> a[[0,1,3]][:,[0,2]]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

OR:或者:

>>> a[[0,1,3],::2]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

Using np.ix_ is the most convenient way to do it (as answered by others), but it also can be done as follows:使用np.ix_是最方便的方法(如其他人所回答),但也可以按如下方式完成:

>>> rows = [0, 1, 3]
>>> cols = [0, 2]

>>> (a[rows].T)[cols].T

array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM