[英]Sparse matrix: how to get nonzero indices for each row
I have an scipy CSR matrix and i want to get element column indices for each row. 我有一个scipy CSR矩阵,我想获得每一行的元素列索引。 My approach is:
我的方法是:
import scipy.sparse as sp
N = 100
d = 0.1
M = sp.rand(N, N, d, format='csr')
indM = [row.nonzero()[1] for row in M]
indM is what i need, it has the same number of row as M and looks like this: indM是我需要的,它与M的行数相同,如下所示:
[array([ 6, 7, 11, ..., 79, 85, 86]),
array([12, 20, 25, ..., 84, 93, 95]),
...
array([ 7, 24, 32, 40, 50, 51, 57, 71, 74, 96]),
array([ 1, 4, 9, ..., 71, 95, 96])]
The problem is that with big matrices this approach looks slow. 问题是,对于大矩阵,这种方法看起来很慢。 Is there any way to avoid list comprehension or somehow speed this up?
有没有办法避免列表理解或以某种方式加快这一点?
Thank you. 谢谢。
You can simply use the indices
and indptr
attributes directly: 您可以直接使用
indices
和indptr
属性:
import numpy
import scipy.sparse
N = 5
d = 0.3
M = scipy.sparse.rand(N, N, d, format='csr')
M.toarray()
# array([[ 0. , 0. , 0. , 0. , 0. ],
# [ 0. , 0. , 0. , 0. , 0.30404632],
# [ 0.63503713, 0. , 0. , 0. , 0. ],
# [ 0.68865311, 0.81492098, 0. , 0. , 0. ],
# [ 0.08984168, 0.87730292, 0. , 0. , 0.18609702]])
M.indices
# array([1, 2, 4, 3, 0, 1, 4], dtype=int32)
M.indptr
# array([0, 3, 4, 6, 6, 7], dtype=int32)
numpy.split(M.indices, M.indptr)[1:-1]
# [array([], dtype=int32),
# array([4], dtype=int32),
# array([0], dtype=int32),
# array([0, 1], dtype=int32),
# array([0, 1, 4], dtype=int32)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.