稀疏矩阵：如何为每一行获得非零索引

Question

I have an scipy CSR matrix and i want to get element column indices for each row. 我有一个scipy CSR矩阵，我想获得每一行的元素列索引。 My approach is: 我的方法是：

import scipy.sparse as sp
N = 100
d = 0.1
M = sp.rand(N, N, d, format='csr')

indM = [row.nonzero()[1] for row in M]

indM is what i need, it has the same number of row as M and looks like this: indM是我需要的，它与M的行数相同，如下所示：

[array([ 6,  7, 11, ..., 79, 85, 86]),
 array([12, 20, 25, ..., 84, 93, 95]),
...
 array([ 7, 24, 32, 40, 50, 51, 57, 71, 74, 96]),
 array([ 1,  4,  9, ..., 71, 95, 96])]

The problem is that with big matrices this approach looks slow. 问题是，对于大矩阵，这种方法看起来很慢。 Is there any way to avoid list comprehension or somehow speed this up? 有没有办法避免列表理解或以某种方式加快这一点？

Thank you. 谢谢。

Answer 1

You can simply use the indices and indptr attributes directly: 您可以直接使用indices和indptr属性：

import numpy
import scipy.sparse

N = 5
d = 0.3
M = scipy.sparse.rand(N, N, d, format='csr')
M.toarray()
# array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
#        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.30404632],
#        [ 0.63503713,  0.        ,  0.        ,  0.        ,  0.        ],
#        [ 0.68865311,  0.81492098,  0.        ,  0.        ,  0.        ],
#        [ 0.08984168,  0.87730292,  0.        ,  0.        ,  0.18609702]])

M.indices
# array([1, 2, 4, 3, 0, 1, 4], dtype=int32)
M.indptr
# array([0, 3, 4, 6, 6, 7], dtype=int32)

numpy.split(M.indices, M.indptr)[1:-1]
# [array([], dtype=int32),
#  array([4], dtype=int32),
#  array([0], dtype=int32),
#  array([0, 1], dtype=int32),
#  array([0, 1, 4], dtype=int32)]

稀疏矩阵：如何为每一行获得非零索引

问题描述

1 个解决方案

解决方案1
7 已采纳 2017-06-14 05:40:08

稀疏矩阵：如何为每一行获得非零索引

问题描述

1 个解决方案

解决方案1 7 已采纳 2017-06-14 05:40:08

解决方案1
7 已采纳 2017-06-14 05:40:08