简体   繁体   English

有效地初始化一个numpy的稀疏矩阵

[英]Initialize a numpy sparse matrix efficiently

I have an array with m rows and arrays as values, which indicate the index of columns and are bounded to a large number n. 我有一个数组,其中包含m行和数组作为值,这些数组指示列的索引,并且以大数n为界。 Eg: 例如:

 Y = [[1,34,203,2032],...,[2984]]

Now I want an efficient way to initialize a sparse numpy matrix X with dimensions m,n and values corresponding to Y (X[i,j] = 1, if j is in Y[i], = 0 otherwise). 现在,我想要一种有效的方法来初始化尺寸为m,n且值对应于Y的稀疏numpy矩阵X(如果j在Y [i]中,则X [i,j] = 1,否则为0)。

Your data are already close to csr format, so I suggest using that: 您的数据已经接近csr格式,因此我建议使用:

import numpy as np
from scipy import sparse
from itertools import chain

# create an example    
m, n = 20, 10
X = np.random.random((m, n)) < 0.1
Y = [list(np.where(y)[0]) for y in X]

# construct the sparse matrix
indptr = np.fromiter(chain((0,), map(len, Y)), int, len(Y) + 1).cumsum()
indices = np.fromiter(chain.from_iterable(Y), int, indptr[-1])
data = np.ones_like(indices)    
S = sparse.csr_matrix((data, indices, indptr), (m, n))
# or    
S = sparse.csr_matrix((data, indices, indptr))

# check
assert np.all(S==X)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM