简体   繁体   中英

`np.concatenate` a numpy array with a sparse matrix

A dataset contains numerical and categorial variables, and I split then into two parts:

cont_data = data[cont_variables].values
disc_data = data[disc_variables].values

Then I use sklearn.preprocessing.OneHotEncoder to encode the categorical data, and then I tried to merge the coded categorical data with the numerical data:

np.concatenate((cont_data, disc_data_coded), axis=1)

But the following error occurs:

ValueError: all the input arrays must have same number of dimensions

I ensured that the number of dimensions are equal:

print(cont_data.shape)        # (24000, 35)
print(disc_data_coded.shape)  # (24000, 26)

Finally, I found that cont_data is a numpy array while

>>> disc_data_coded
<24000x26 sparse matrix of type '<class 'numpy.float64'>'
with 312000 stored elements in Compressed Sparse Row format>

I changed the parameter sparse in OneHotEncoder to be False , everything is OK. But the question is, how can I merge a numpy array with a sparse matrix directly, without setting sparse=False ?

Sparse matrices are not subclasses of numpy arrays; so numpy methods often don't work. Use sparse functions instead, such as sparse.vstack and sparse.hstack . But all inputs then have to be sparse.

Or make the sparse matrix dense first, with .toarray() , and use np.concatenate .

Do you want the result to sparse or dense?

In [32]: sparse.vstack((sparse.csr_matrix(np.arange(10)),sparse.csr_matrix(np.on
    ...: es((3,10)))))
Out[32]: 
<4x10 sparse matrix of type '<class 'numpy.float64'>'
    with 39 stored elements in Compressed Sparse Row format>
In [33]: np.concatenate((sparse.csr_matrix(np.arange(10)).A,np.ones((3,10))))
Out[33]: 
array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM