简体   繁体   中英

list of numpy vectors to sparse array

I have a list of numpy vectors of the format:

    [array([[-0.36314615,  0.80562619, -0.82777381, ...,  2.00876354,2.08571887, -1.24526026]]), 
     array([[ 0.9766923 , -0.05725135, -0.38505339, ...,  0.12187988,-0.83129255,  0.32003683]]),
     array([[-0.59539878,  2.27166874,  0.39192573, ..., -0.73741573,1.49082653,  1.42466276]])]

here, only 3 vectors in the list are shown. I have 100s..

The maximum number of elements in one vector is around 10 million

All the arrays in the list have unequal number of elements but the maximum number of elements is fixed. Is it possible to create a sparse matrix using these vectors in python such that I have zeros in place of elements for the vectors which are smaller than the maximum size?

Try this:

from scipy import sparse
M = sparse.lil_matrix((num_of_vectors, max_vector_size))

for i,v in enumerate(vectors):
     M[i, :v.size] = v

Then take a look at this page: http://docs.scipy.org/doc/scipy/reference/sparse.html

The lil_matrix format is good for constructing the matrix, but you'll want to convert it to a different format like csr_matrix before operating on them.

In this approach you replace the elements below your thresold by 0 and then create a sparse matrix out of them. I am suggesting the coo_matrix since it is the fastest to convert to the other types according to your purposes. Then you can scipy.sparse.vstack() them to build your matrix accounting all elements in the list:

import scipy.sparse as ss
import numpy as np

old_list = [np.random.random(100000) for i in range(5)]

threshold = 0.01
for a in old_list:
    a[np.absolute(a) < threshold] = 0
old_list = [ss.coo_matrix(a) for a in old_list]
m = ss.vstack( old_list )

A little convoluted, but I would probably do it like this:

>>> import scipy.sparse as sps
>>> a = [np.arange(5), np.arange(7), np.arange(3)]
>>> lens = [len(j) for j in a]
>>> cols = np.concatenate([np.arange(j) for j in lens])
>>> rows = np.concatenate([np.repeat(j, len_) for j, len_ in enumerate(lens)])
>>> data = np.concatenate(a)
>>> b = sps.coo_matrix((data,(rows, cols)))
>>> b.toarray()
array([[0, 1, 2, 3, 4, 0, 0],
       [0, 1, 2, 3, 4, 5, 6],
       [0, 1, 2, 0, 0, 0, 0]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM