简体   繁体   English

从python中的稀疏矩阵列出非零元素

[英]List non zero elements from sparse matrix in python

How to list, in a simple and one line code (and fast!), all non zero elements of a csr_matrix ? 如何以简单的单行代码(和快速!) csr_matrix所有非零元素?

I'm using this code: 我正在使用此代码:

edges_list = list([tuple(row) for row in np.transpose(A.nonzero())])
weight_list = [A[e] for e in edges_list]

but it is taking quite a long time to execute. 但执行需要相当长的时间。

For a CSR matrix in canonical form, access the data array directly: 对于规范形式的CSR矩阵,直接访问数据数组:

A.data

but be aware that matrices not in canonical form may include explicit zeros or duplicate entries in their representation, which will need special handling. 但请注意,不是规范形式的矩阵可能在其表示中包含明确的零或重复条目,这将需要特殊处理。 For example, 例如,

# Merge duplicates and remove explicit zeros. Both operations modify A.
# We sum duplicates first because they might sum to zero - for example,
# if a 5 and a -5 are in the same spot, we have to sum them to 0 and then remove the 0.
A.sum_duplicates()
A.eliminate_zeros()

# Now use A.data
do_whatever_with(A.data)

You can use A.nonzero() to index into A directly: 您可以使用A.nonzero()直接索引到A

In [19]: A = np.random.randint(0, 3, (3, 3))

In [20]: A
Out[20]: 
array([[2, 1, 1],
       [1, 2, 2],
       [0, 1, 0]])

In [21]: A[A.nonzero()]
Out[21]: array([2, 1, 1, 1, 2, 2, 1])

The result is the same as with your approach: 结果与您的方法相同:

In [22]: edges_list = list([tuple(row) for row in np.transpose(A.nonzero())])

In [23]: [A[e] for e in edges_list]
Out[23]: [2, 1, 1, 1, 2, 2, 1]

And obviously quite a bit faster (and more so if the matrix gets bigger): 而且显然要快得多(如果矩阵变大则更多):

In [25]: %timeit [A[e] for e in list([tuple(row) for row in np.transpose(A.nonzero())])]
10000 loops, best of 3: 48 µs per loop

In [26]: %timeit A[A.nonzero()]
100000 loops, best of 3: 10.7 µs per loop

Also works with scipy csr_matrix , although there are better methods for those, as shown in other answers: 也适用于scipy csr_matrix ,尽管有更好的方法,如其他答案所示:

In [30]: M = scipy.sparse.csr_matrix(A)

In [31]: M[M.nonzero()]
Out[31]: matrix([[2, 1, 1, 1, 2, 2, 1]], dtype=int32)

Just use A.data 只需使用A.data

In [16]: from scipy.sparse import csr_matrix

In [17]: A = csr_matrix([[1,0,0],[0,2,0]])

In [18]: A.data
Out[18]: array([1, 2])

If the sparse matrix has been modified or to be safe, you should use: A.eliminate_zeros() 如果稀疏矩阵已被修改或是安全的,您应该使用: A.eliminate_zeros()

In [19]: A[0,0] = 0

In [20]: A.data
Out[20]: array([0, 2])

In [21]: A.eliminate_zeros()

In [22]: A.data
Out[22]: array([2])

You could use scipy.sparse.find like this: 您可以像这样使用scipy.sparse.find

>>> from scipy.sparse import csr_matrix, find
>>> A = csr_matrix([[7.0, 8.0, 0],[0, 0, 9.0]])
>>> find(A)
(array([0, 0, 1], dtype=int32), array([0, 1, 2], 
dtype=int32), array([ 7.,  8.,  9.]))

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.find.html https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.find.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM