简体   繁体   English

向空的NumPy数组添加新列

[英]Adding a New Column to an Empty NumPy Array

I'm trying to add a new column to an empty NumPy array and am facing some troubles. 我正在尝试向空的NumPy数组添加新列,并且遇到了一些麻烦。 I've looked at a lot of other questions, but for some reason they don't seem to be helping me solve the problem I'm facing, so I decided to ask my own question. 我看了很多其他问题,但是由于某种原因,它们似乎并不能帮助我解决我所面临的问题,因此我决定问自己一个问题。

I have an empty NumPy array such that: 我有一个空的NumPy数组,例如:

array1 = np.array([])

Let's say I have data that is of shape (100, 100) , and want to append each column to array1 one by one. 假设我有形状为(100, 100) ,并且想要将每一列一个一地追加到array1 However, if I do for example: 但是,例如,如果我这样做:

array1 = np.append(array1, some_data[:, 0])
array1 = np.append(array1, some_data[:, 1])

I noticed that I won't be getting a (100, 2) matrix, but a (200,) array. 我注意到我不会得到一个(100, 2)矩阵,而是一个(200,)数组。 So I tried to specify the axis as 所以我尝试将axis指定为

array1 = np.append(array1, some_data[:, 0], axis=1)

which produces a AxisError: axis 1 is out of bounds for array of dimension 1. 这将产生AxisError: axis 1 is out of bounds for array of dimension 1.


Next I tried to use the np.c_[] method: 接下来,我尝试使用np.c_[]方法:

array1 = np.c_[array1, somedata[:, 0]]

which gives me a ValueError: all the input array dimensions except for the concatenation axis must match exactly. 这给我一个ValueError: all the input array dimensions except for the concatenation axis must match exactly.


Is there any way that I would be able to add columns to the NumPy array sequentially? 有什么方法可以将列顺序添加到NumPy数组中?

Thank you. 谢谢。


EDIT 编辑

I learned that my initial question didn't contain enough information for others to offer help, and made this update to make up for the initial mistake. 我了解到,最初的问题所包含的信息不足,无法为其他人提供帮助,因此进行了更新以弥补最初的错误。

My big objective is to make a program that selects features in a "greedy fashion." 我的主要目标是制作一个以“贪婪的方式”选择功能的程序。 Basically, I'm trying to take the design matrix some_data , which is a (100, 100) matrix containing floating point numbers as entries, and fitting a linear regression model with an increasing number of features until I find the best set of features. 基本上,我正在尝试使用设计矩阵some_data ,它是一个包含浮点数的(100, 100)矩阵作为条目,并使用越来越多的特征拟合线性回归模型,直到找到最佳的特征集。

For example, since I have a total of 100 features, the first round would fit the model on each 100, select the best one and store it, then continue with the remaining 99. 例如,由于我总共有100个特征,因此第一轮将在每个100个模型上拟合模型,选择最佳模型并进行存储,然后继续其余99个模型。

That's what I'm trying to do in my head, but I got stuck from the beginning with the problem I mentioned. 这就是我想做的事情,但是从一开始我就遇到了我提到的问题。

You start with a (0,) array and (n,) shaped one: 您从一个(0,)数组和一个形状为(n,)的数组开始:

In [482]: arr1 = np.array([])
In [483]: arr1.shape
Out[483]: (0,)
In [484]: arr2 = np.array([1,2,3])
In [485]: arr2.shape
Out[485]: (3,)

np.append uses concatenate (but with some funny business when axis is not provided): np.append使用concatenate (但是在未提供axis的情况下进行一些有趣的事情):

In [486]: np.append(arr1, arr2)
Out[486]: array([1., 2., 3.])
In [487]: np.append(arr1, arr2,axis=0)    
Out[487]: array([1., 2., 3.])
In [489]: np.concatenate([arr1, arr2])
Out[489]: array([1., 2., 3.])

And trying axis=1 并尝试轴= 1

In [488]: np.append(arr1, arr2,axis=1)
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
<ipython-input-488-457b8657453e> in <module>()
----> 1 np.append(arr1, arr2,axis=1)

/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in append(arr, values, axis)
   4526         values = ravel(values)
   4527         axis = arr.ndim-1
-> 4528     return concatenate((arr, values), axis=axis)

AxisError: axis 1 is out of bounds for array of dimension 1

Look at the whole message - the error occurs in the concatenate step. 查看整个消息-错误发生在concatenate步骤中。 You can't concatenate 1d arrays along axis=1 . 您无法沿axis=1并置一维数组。

Using np.append or even np.concatenate iteratively is slow (it creates a new array each time), and hard to initialize correctly. 迭代使用np.append甚至np.concatenate速度很慢(每次都会创建一个新数组),并且很难正确初始化。 It is a poor substitute for the widely use list append-to-empty-list recipe. 它不能替代广泛使用的列表append-to-empty-list配方。

np.c_ is also just a cover function for concatenate . np.c_仅仅是concatenate的掩盖函数。

There isn't just one empty array. 不只是一个empty数组。 np.array([[]]) and np.array([[[]]]) also have 0 elements. np.array([[]])np.array([[[]]])也有0个元素。

If you want to add a column to an array, you need to start with a 2d array, and the column also needs to be 2d. 如果要将列添加到数组,则需要以2d数组开始,并且该列也需要为2d。

Here's an example of a proper concatenation of 2 2d arrays: 这是2个2d数组正确连接的示例:

In [490]: np.concatenate([ np.zeros((3,0),int), np.arange(3)[:,None]], axis=1)
Out[490]: 
array([[0],
       [1],
       [2]])

column_stack is another cover function for concatenate that makes sure the inputs are 2d. column_stack是另一个用于concatenate覆盖函数,可确保输入为2d。 But even with that getting an initial 'empty' array is tricky. 但是即使有了一个初始的“空”数组也很棘手。

In [492]: np.column_stack([np.zeros(3,int), np.arange(3)])
Out[492]: 
array([[0, 0],
       [0, 1],
       [0, 2]])
In [493]: np.column_stack([np.zeros((3,0),int), np.arange(3)])
Out[493]: 
array([[0],
       [1],
       [2]])

np.c_ is a lot like column_stack , though implemented in a different way: np.c_很像column_stack ,虽然以不同的方式来实现:

In [496]: np.c_[np.zeros(3,int), np.arange(3)]
Out[496]: 
array([[0, 0],
       [0, 1],
       [0, 2]])

The basic message is, that when using np.concatenate you need to pay attention to dimensions. 基本信息是,在使用np.concatenate您需要注意尺寸。 Its variants allow you to fudge things a bit, but you really need to understand that fudging to get things right, especially when starting from this poorly defined idea of a 'empty' array. 它的变体允许您稍微弄乱事情,但是您确实需要了解这种弄乱才能使事情正确,尤其是从这种定义不明确的“空”数组开始时。

I usually use concatenate method and do it like this: 我通常使用连接方法,并按以下方式进行操作:

# Some stuff
alldata = None
....
array1 = np.random.random((100,1))
if alldata is None: alldata = array1
...
array2 = np.random.random((100,1))

alldata = np.concatenate((alldata,array2),axis=1)   

In case, you are working with vectors: 如果您正在使用向量:

alldata = None
....
array1 = np.random.random((100,))
if alldata is None: alldata = array1[:,np.newaxis]
...
array2 = np.random.random((100,))

alldata = np.concatenate((alldata,array2[:,np.newaxis]),axis=1)   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM