简体   繁体   English

提取numpy数组中的特定列

[英]Extracting specific columns in numpy array

This is an easy question but say I have an MxN matrix.这是一个简单的问题,但假设我有一个 MxN 矩阵。 All I want to do is extract specific columns and store them in another numpy array but I get invalid syntax errors.我想要做的就是提取特定的列并将它们存储在另一个 numpy 数组中,但我得到无效的语法错误。 Here is the code:这是代码:

extractedData = data[[:,1],[:,9]]. 

It seems like the above line should suffice but I guess not.上面的行似乎就足够了,但我想不是。 I looked around but couldn't find anything syntax wise regarding this specific scenario.我环顾四周,但找不到任何关于这个特定场景的语法。

I assume you wanted columns 1 and 9 ?我假设您想要1列和9列?

To select multiple columns at once, use要一次选择多个列,请使用

X = data[:, [1, 9]]

To select one at a time, use要一次选择一个,请使用

x, y = data[:, 1], data[:, 9]

With names:有名字:

data[:, ['Column Name1','Column Name2']]

You can get the names from data.dtype.names您可以从data.dtype.names中获取名称……

假设您想使用该代码片段获取第 1 列和第 9 列,它应该是:

extractedData = data[:,[1,9]]

if you want to extract only some columns:如果您只想提取一些列:

idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]

if you want to exclude specific columns:如果要排除特定列:

idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]

Just:只是:

>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355  , 0.33025395],
        [0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
        [0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
        [0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
        [0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
        [0.67463754, 0.43158254],
        [0.86431513, 0.12153138],
        [0.66139215, 0.08400288],
        [0.76385882, 0.11002419]])

The columns need not to be in order:列不必按顺序排列:

>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355  ],
        [0.43158254, 0.67463754, 0.95367876],
        [0.12153138, 0.86431513, 0.73006437],
        [0.08400288, 0.66139215, 0.56769924],
        [0.11002419, 0.76385882, 0.2509888 ]])

One thing I would like to point out is, if the number of columns you want to extract is 1 the resulting matrix would not be a Mx1 Matrix as you might expect but instead an array containing the elements of the column you extracted.我想指出的一件事是,如果您要提取的列数为 1,则生成的矩阵不会像您预期的那样是 Mx1 矩阵,而是包含您提取的列的元素的数组。

To convert it to Matrix the reshape(M,1) method should be used on the resulting array.要将其转换为 Matrix,应在结果数组上使用reshape(M,1)方法。

One more thing you should pay attention to when selecting columns from ND array using a list like this:在使用这样的列表从 ND 数组中选择列时,您还应该注意一件事:

data[:,:,[1,9]]

If you are removing a dimension (by selecting only one row, for example), the resulting array will be (for some reason) permuted .如果要删除一个维度(例如,仅选择一行),则生成的数组将(出于某种原因) permuted So:所以:

print data.shape            # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape       # gives [2,20] instead of [20,2]!!

您可以使用以下内容:

extracted_data = data.ix[:,['Column1','Column2']]

I think the solution here is not working with an update of the python version anymore, one way to do it with a new python function for it is:我认为这里的解决方案不再适用于 python 版本的更新,使用新的 python 函数的一种方法是:

extracted_data = data[['Column Name1','Column Name2']].to_numpy()

which gives you the desired outcome.这会给你想要的结果。

The documentation you can find here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy您可以在此处找到文档: https ://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

I could not edit the chosen answer so I'm adding an answer to clarify that using an integer to index seems to be returning a view (not a copy) while using a list returns a copy我无法编辑选择的答案,所以我添加了一个答案以澄清使用整数索引似乎返回视图(而不是副本),而使用列表返回副本

>>> x = np.zeros(shape=[2, 3])
>>> y = x[:, [0, 1]]
>>> z1, z2 = x[:, 0], x[:, 1]

>>> y[0, 0] = 1
>>> print(y)
[[1. 0.]
 [0. 0.]]
>>> print(x)
[[0. 0. 0.]
 [0. 0. 0.]]

>>> z1[0] = 2
>>> print(z1)
[2. 0.]
>>> print(x)
[[2. 0. 0.]
 [0. 0. 0.]]

你也可以使用extractData=data([:,1],[:,9])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM