简体   繁体   English

如何访问 NumPy 多维数组的第 i 列?

[英]How do I access the ith column of a NumPy multidimensional array?

Given:鉴于:

test = numpy.array([[1, 2], [3, 4], [5, 6]])

test[i] gives the ith row (eg [1, 2] ). test[i]给出第i行(例如[1, 2] )。 How do I access the ith column?如何访问第i列? (eg [1, 3, 5] ). (例如[1, 3, 5] )。 Also, would this be an expensive operation?另外,这会是一项昂贵的操作吗?

To access column 0:要访问第 0 列:

>>> test[:, 0]
array([1, 3, 5])

To access row 0:要访问第 0 行:

>>> test[0, :]
array([1, 2])

This is covered in Section 1.4 (Indexing) of the NumPy reference .这在NumPy 参考的第 1.4 节(索引)中有介绍。 This is quick, at least in my experience.这很快,至少在我的经验中。 It's certainly much quicker than accessing each element in a loop.这肯定比在循环中访问每个元素要快得多。

>>> test[:,0]
array([1, 3, 5])

this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have这个命令给你一个行向量,如果你只是想循环它,没关系,但如果你想与其他维度为 3xN 的数组进行 hstack,你将拥有

ValueError: all the input arrays must have same number of dimensions

while尽管

>>> test[:,[0]]
array([[1],
       [3],
       [5]])

gives you a column vector, so that you can do concatenate or hstack operation.为您提供列向量,以便您可以进行连接或 hstack 操作。

eg例如

>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
       [3, 4, 3],
       [5, 6, 5]])

And if you want to access more than one column at a time you could do:如果您想一次访问多个列,您可以这样做:

>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
       [3, 5],
       [6, 8]])

You could also transpose and return a row:您还可以转置并返回一行:

In [4]: test.T[0]
Out[4]: array([1, 3, 5])

Although the question has been answered, let me mention some nuances.虽然这个问题已经回答了,但让我提一些细微差别。

Let's say you are interested in the first column of the array假设您对数组的第一列感兴趣

arr = numpy.array([[1, 2],
                   [3, 4],
                   [5, 6]])

As you already know from other answers, to get it in the form of "row vector" (array of shape (3,) ), you use slicing:正如您从其他答案中已经知道的那样,要以“行向量”(形状(3,)的数组)的形式获取它,您可以使用切片:

arr_col1_view = arr[:, 1]         # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy()  # creates a copy of the 1st column of the arr

To check if an array is a view or a copy of another array you can do the following:要检查一个数组是视图还是另一个数组的副本,您可以执行以下操作:

arr_col1_view.base is arr  # True
arr_col1_copy.base is arr  # False

see ndarray.base .ndarray.base

Besides the obvious difference between the two (modifying arr_col1_view will affect the arr ), the number of byte-steps for traversing each of them is different:除了两者之间的明显区别(修改arr_col1_view会影响arr ),遍历它们的字节步数是不同的:

arr_col1_view.strides[0]  # 8 bytes
arr_col1_copy.strides[0]  # 4 bytes

see strides and this answer .看到strides和这个答案

Why is this important?为什么这很重要? Imagine that you have a very big array A instead of the arr :想象一下,您有一个非常大的数组A而不是arr

A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1] 
A_col1_copy = A[:, 1].copy()

and you want to compute the sum of all the elements of the first column, ie A_col1_view.sum() or A_col1_copy.sum() .并且您想要计算第一列的所有元素的总和,即A_col1_view.sum()A_col1_copy.sum() Using the copied version is much faster:使用复制的版本要快得多:

%timeit A_col1_view.sum()  # ~248 µs
%timeit A_col1_copy.sum()  # ~12.8 µs

This is due to the different number of strides mentioned before:这是由于前面提到的步数不同:

A_col1_view.strides[0]  # 40000 bytes
A_col1_copy.strides[0]  # 4 bytes

Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy ).尽管使用列副本似乎更好,但并非总是如此,因为制作副本也需要时间并使用更多内存(在这种情况下,我花了大约 200 µs 来创建A_col1_copy )。 However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.但是,如果我们首先需要副本,或者我们需要对数组的特定列执行许多不同的操作,并且我们可以牺牲内存来提高速度,那么制作副本是可行的方法。

In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:在我们对主要使用列感兴趣的情况下,以列优先 ('F') 顺序而不是行优先 ('C') 顺序(这是默认值)创建我们的数组可能是一个好主意,然后像以前一样进行切片以获得一列而不复制它:

A = np.asfortranarray(A)   # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0]     # 4 bytes

%timeit A_col1_view.sum()  # ~12.6 µs vs ~248 µs

Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.现在,在列视图上执行求和操作(或任何其他操作)与在列副本上执行它一样快。

Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.最后让我注意,转置数组并使用行切片与在原始数组上使用列切片相同,因为转置是通过交换原始数组的形状和步幅来完成的。

A[:, 1].strides[0]    # 40000 bytes
A.T[1, :].strides[0]  # 40000 bytes

To get several and indepent columns, just:要获得多个独立的列,只需:

> test[:,[0,2]]

you will get colums 0 and 2您将获得第 0 列和第 2 列

>>> test
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

>>> ncol = test.shape[1]
>>> ncol
5L

Then you can select the 2nd - 4th column this way:然后您可以通过这种方式选择第 2 - 4 列:

>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
       [6, 7, 8]])

This is not multidimensional.这不是多维的。 It is 2 dimensional array.它是二维数组。 where you want to access the columns you wish.您要访问所需列的位置。

test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b]  # you can provide index in place of a and b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将数组列表转换为单个多维 numpy 数组? - How do I convert a list of arrays to a single multidimensional numpy array? 如何将第ith行的numpy数组的各个元素与第ith行的另一个numpy数组的元素相乘? - How to multiply individual elements of numpy array of row ith with element of another numpy array of row ith? 计算array1第i行和array2第i列的乘积 - NumPy - Computing product of ith row of array1 and ith column of array2 - NumPy 如何将多维 numpy 数组插入到 Pandas 列? - How to insert a multidimensional numpy array to pandas column? 按列过滤多维numpy数组 - Filter a multidimensional numpy array by column Padas,如何创建列为多维数组的数据集? - Padas, how do I create a dataset where the column is a multidimensional array? 如何生成具有随机数的多维 NumPy 数组,其维度为未声明维度的另一个数组? - How do I generate a multidimensional NumPy array with random numbers with the dimension of another array whose dimension is not declared? 如何使用Numpy将矢量中的ith值沿着矩阵的ith行分布在向量中? - How can I distribute the ith value in a vector along the ith row of a matrix using Numpy? 如何按名称访问numpy数组列? - How can I access a numpy array column by name? 从随机2D numpy数组中提取ith和ith + 1 - extracting ith and ith+1 from random 2D numpy array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM