[英]How do I access the ith column of a NumPy multidimensional array?
Given:鉴于:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i]
gives the ith row (eg [1, 2]
). test[i]
给出第i行(例如[1, 2]
)。 How do I access the ith column?如何访问第i列? (eg
[1, 3, 5]
). (例如
[1, 3, 5]
)。 Also, would this be an expensive operation?另外,这会是一项昂贵的操作吗?
To access column 0:要访问第 0 列:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:要访问第 0 行:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference .这在NumPy 参考的第 1.4 节(索引)中有介绍。 This is quick, at least in my experience.
这很快,至少在我的经验中。 It's certainly much quicker than accessing each element in a loop.
这肯定比在循环中访问每个元素要快得多。
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have这个命令给你一个行向量,如果你只是想循环它,没关系,但如果你想与其他维度为 3xN 的数组进行 hstack,你将拥有
ValueError: all the input arrays must have same number of dimensions
while尽管
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.为您提供列向量,以便您可以进行连接或 hstack 操作。
eg例如
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
And if you want to access more than one column at a time you could do:如果您想一次访问多个列,您可以这样做:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
You could also transpose and return a row:您还可以转置并返回一行:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Although the question has been answered, let me mention some nuances.虽然这个问题已经回答了,但让我提一些细微差别。
Let's say you are interested in the first column of the array假设您对数组的第一列感兴趣
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)
), you use slicing:正如您从其他答案中已经知道的那样,要以“行向量”(形状
(3,)
的数组)的形式获取它,您可以使用切片:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:要检查一个数组是视图还是另一个数组的副本,您可以执行以下操作:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base .见ndarray.base 。
Besides the obvious difference between the two (modifying arr_col1_view
will affect the arr
), the number of byte-steps for traversing each of them is different:除了两者之间的明显区别(修改
arr_col1_view
会影响arr
),遍历它们的字节步数是不同的:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer .看到strides和这个答案。
Why is this important?为什么这很重要? Imagine that you have a very big array
A
instead of the arr
:想象一下,您有一个非常大的数组
A
而不是arr
:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, ie A_col1_view.sum()
or A_col1_copy.sum()
.并且您想要计算第一列的所有元素的总和,即
A_col1_view.sum()
或A_col1_copy.sum()
。 Using the copied version is much faster:使用复制的版本要快得多:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:这是由于前面提到的步数不同:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy
).尽管使用列副本似乎更好,但并非总是如此,因为制作副本也需要时间并使用更多内存(在这种情况下,我花了大约 200 µs 来创建
A_col1_copy
)。 However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.但是,如果我们首先需要副本,或者我们需要对数组的特定列执行许多不同的操作,并且我们可以牺牲内存来提高速度,那么制作副本是可行的方法。
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:在我们对主要使用列感兴趣的情况下,以列优先 ('F') 顺序而不是行优先 ('C') 顺序(这是默认值)创建我们的数组可能是一个好主意,然后像以前一样进行切片以获得一列而不复制它:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.现在,在列视图上执行求和操作(或任何其他操作)与在列副本上执行它一样快。
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.最后让我注意,转置数组并使用行切片与在原始数组上使用列切片相同,因为转置是通过交换原始数组的形状和步幅来完成的。
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
To get several and indepent columns, just:要获得多个独立的列,只需:
> test[:,[0,2]]
you will get colums 0 and 2您将获得第 0 列和第 2 列
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:然后您可以通过这种方式选择第 2 - 4 列:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
This is not multidimensional.这不是多维的。 It is 2 dimensional array.
它是二维数组。 where you want to access the columns you wish.
您要访问所需列的位置。
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.