简体   繁体   English

稀疏数组的numpy向量列表

[英]list of numpy vectors to sparse array

I have a list of numpy vectors of the format: 我有一个格式的numpy向量列表:

    [array([[-0.36314615,  0.80562619, -0.82777381, ...,  2.00876354,2.08571887, -1.24526026]]), 
     array([[ 0.9766923 , -0.05725135, -0.38505339, ...,  0.12187988,-0.83129255,  0.32003683]]),
     array([[-0.59539878,  2.27166874,  0.39192573, ..., -0.73741573,1.49082653,  1.42466276]])]

here, only 3 vectors in the list are shown. 这里,列表中只显示了3个向量。 I have 100s.. 我有100 ...

The maximum number of elements in one vector is around 10 million 一个向量中的最大元素数约为1000万

All the arrays in the list have unequal number of elements but the maximum number of elements is fixed. 列表中的所有数组都具有不等数量的元素,但最大元素数是固定的。 Is it possible to create a sparse matrix using these vectors in python such that I have zeros in place of elements for the vectors which are smaller than the maximum size? 是否有可能在python中使用这些向量创建一个稀疏矩阵,以便我用零代替小于最大大小的向量的元素?

Try this: 试试这个:

from scipy import sparse
M = sparse.lil_matrix((num_of_vectors, max_vector_size))

for i,v in enumerate(vectors):
     M[i, :v.size] = v

Then take a look at this page: http://docs.scipy.org/doc/scipy/reference/sparse.html 然后看一下这个页面: http//docs.scipy.org/doc/scipy/reference/sparse.html

The lil_matrix format is good for constructing the matrix, but you'll want to convert it to a different format like csr_matrix before operating on them. lil_matrix格式适用于构造矩阵,但是在对它们进行操作之前,您需要将其转换为不同的格式,如csr_matrix

In this approach you replace the elements below your thresold by 0 and then create a sparse matrix out of them. 在这种方法中,您将thresold下面的元素替换为0 ,然后从中创建一个稀疏矩阵。 I am suggesting the coo_matrix since it is the fastest to convert to the other types according to your purposes. 我建议coo_matrix因为它是根据你的目的转换到其他类型最快的。 Then you can scipy.sparse.vstack() them to build your matrix accounting all elements in the list: 然后你可以使用scipy.sparse.vstack()来构建你的矩阵,计算列表中的所有元素:

import scipy.sparse as ss
import numpy as np

old_list = [np.random.random(100000) for i in range(5)]

threshold = 0.01
for a in old_list:
    a[np.absolute(a) < threshold] = 0
old_list = [ss.coo_matrix(a) for a in old_list]
m = ss.vstack( old_list )

A little convoluted, but I would probably do it like this: 有点令人费解,但我可能会这样做:

>>> import scipy.sparse as sps
>>> a = [np.arange(5), np.arange(7), np.arange(3)]
>>> lens = [len(j) for j in a]
>>> cols = np.concatenate([np.arange(j) for j in lens])
>>> rows = np.concatenate([np.repeat(j, len_) for j, len_ in enumerate(lens)])
>>> data = np.concatenate(a)
>>> b = sps.coo_matrix((data,(rows, cols)))
>>> b.toarray()
array([[0, 1, 2, 3, 4, 0, 0],
       [0, 1, 2, 3, 4, 5, 6],
       [0, 1, 2, 0, 0, 0, 0]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM