简体   繁体   中英

Adding a New Column to an Empty NumPy Array

I'm trying to add a new column to an empty NumPy array and am facing some troubles. I've looked at a lot of other questions, but for some reason they don't seem to be helping me solve the problem I'm facing, so I decided to ask my own question.

I have an empty NumPy array such that:

array1 = np.array([])

Let's say I have data that is of shape (100, 100) , and want to append each column to array1 one by one. However, if I do for example:

array1 = np.append(array1, some_data[:, 0])
array1 = np.append(array1, some_data[:, 1])

I noticed that I won't be getting a (100, 2) matrix, but a (200,) array. So I tried to specify the axis as

array1 = np.append(array1, some_data[:, 0], axis=1)

which produces a AxisError: axis 1 is out of bounds for array of dimension 1.


Next I tried to use the np.c_[] method:

array1 = np.c_[array1, somedata[:, 0]]

which gives me a ValueError: all the input array dimensions except for the concatenation axis must match exactly.


Is there any way that I would be able to add columns to the NumPy array sequentially?

Thank you.


EDIT

I learned that my initial question didn't contain enough information for others to offer help, and made this update to make up for the initial mistake.

My big objective is to make a program that selects features in a "greedy fashion." Basically, I'm trying to take the design matrix some_data , which is a (100, 100) matrix containing floating point numbers as entries, and fitting a linear regression model with an increasing number of features until I find the best set of features.

For example, since I have a total of 100 features, the first round would fit the model on each 100, select the best one and store it, then continue with the remaining 99.

That's what I'm trying to do in my head, but I got stuck from the beginning with the problem I mentioned.

You start with a (0,) array and (n,) shaped one:

In [482]: arr1 = np.array([])
In [483]: arr1.shape
Out[483]: (0,)
In [484]: arr2 = np.array([1,2,3])
In [485]: arr2.shape
Out[485]: (3,)

np.append uses concatenate (but with some funny business when axis is not provided):

In [486]: np.append(arr1, arr2)
Out[486]: array([1., 2., 3.])
In [487]: np.append(arr1, arr2,axis=0)    
Out[487]: array([1., 2., 3.])
In [489]: np.concatenate([arr1, arr2])
Out[489]: array([1., 2., 3.])

And trying axis=1

In [488]: np.append(arr1, arr2,axis=1)
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
<ipython-input-488-457b8657453e> in <module>()
----> 1 np.append(arr1, arr2,axis=1)

/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in append(arr, values, axis)
   4526         values = ravel(values)
   4527         axis = arr.ndim-1
-> 4528     return concatenate((arr, values), axis=axis)

AxisError: axis 1 is out of bounds for array of dimension 1

Look at the whole message - the error occurs in the concatenate step. You can't concatenate 1d arrays along axis=1 .

Using np.append or even np.concatenate iteratively is slow (it creates a new array each time), and hard to initialize correctly. It is a poor substitute for the widely use list append-to-empty-list recipe.

np.c_ is also just a cover function for concatenate .

There isn't just one empty array. np.array([[]]) and np.array([[[]]]) also have 0 elements.

If you want to add a column to an array, you need to start with a 2d array, and the column also needs to be 2d.

Here's an example of a proper concatenation of 2 2d arrays:

In [490]: np.concatenate([ np.zeros((3,0),int), np.arange(3)[:,None]], axis=1)
Out[490]: 
array([[0],
       [1],
       [2]])

column_stack is another cover function for concatenate that makes sure the inputs are 2d. But even with that getting an initial 'empty' array is tricky.

In [492]: np.column_stack([np.zeros(3,int), np.arange(3)])
Out[492]: 
array([[0, 0],
       [0, 1],
       [0, 2]])
In [493]: np.column_stack([np.zeros((3,0),int), np.arange(3)])
Out[493]: 
array([[0],
       [1],
       [2]])

np.c_ is a lot like column_stack , though implemented in a different way:

In [496]: np.c_[np.zeros(3,int), np.arange(3)]
Out[496]: 
array([[0, 0],
       [0, 1],
       [0, 2]])

The basic message is, that when using np.concatenate you need to pay attention to dimensions. Its variants allow you to fudge things a bit, but you really need to understand that fudging to get things right, especially when starting from this poorly defined idea of a 'empty' array.

I usually use concatenate method and do it like this:

# Some stuff
alldata = None
....
array1 = np.random.random((100,1))
if alldata is None: alldata = array1
...
array2 = np.random.random((100,1))

alldata = np.concatenate((alldata,array2),axis=1)   

In case, you are working with vectors:

alldata = None
....
array1 = np.random.random((100,))
if alldata is None: alldata = array1[:,np.newaxis]
...
array2 = np.random.random((100,))

alldata = np.concatenate((alldata,array2[:,np.newaxis]),axis=1)   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM