简体   繁体   English

scipy.sparse 从 csc_matrix 中找到前 3 个值

[英]scipy.sparse find the top 3 value from csc_matrix

I have a very large sparse matrix, I want to retrieve the row and column value of the top 10 value.我有一个非常大的稀疏矩阵,我想检索前 10 个值的行和列值。 I have created a small sample matrix down below to simulate this case.我在下面创建了一个小样本矩阵来模拟这种情况。 Any idea how to get the top 3 in the following example?知道如何在以下示例中获得前 3 名吗?

import numpy as np
from scipy.sparse import csc_matrix

a = np.matrix([[7,2,0],[0,0,6],[1,0,4]])
m = csc_matrix(a)
  (0, 0)    7
  (2, 0)    1
  (0, 1)    2
  (1, 2)    6
  (2, 2)    4

Expected预期的

  (0, 0)    7
  (1, 2)    6
  (2, 2)    4

Does this help?这有帮助吗?

If you just want the values:如果您只想要值:

n = 3
np.partition(np.asarray(a), a.size - n, axis=None)[-n:]

Output Output

array([4, 6, 7])

If you need the position如果您需要 position

n = 3
[np.where(a == x) for x in np.partition(np.asarray(a), 
                                        a.size - n, 
                                        axis=None)[-n:]]

Output Output

[(array([2], dtype=int64), array([2], dtype=int64)),
 (array([1], dtype=int64), array([2], dtype=int64)),
 (array([0], dtype=int64), array([0], dtype=int64))]
In [32]: a = np.array([[7,2,0],[0,0,6],[1,0,4]])
In [33]: M = sparse.coo_matrix(a)
In [34]: M
Out[34]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in COOrdinate format>
In [35]: print(M)
  (0, 0)    7
  (0, 1)    2
  (1, 2)    6
  (2, 0)    1
  (2, 2)    4
In [36]: M.data
Out[36]: array([7, 2, 6, 1, 4])
In [37]: idx = np.argsort(M.data)
In [38]: idx
Out[38]: array([3, 1, 4, 2, 0])
In [39]: idx = idx[-3:]
In [40]: M.data[idx]
Out[40]: array([4, 6, 7])
In [41]: M1 = sparse.coo_matrix((M.data[idx], (M.row[idx], M.col[idx])), M.shape
    ...: )
In [42]: M1
Out[42]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 3 stored elements in COOrdinate format>
In [43]: M1.A
Out[43]: 
array([[7, 0, 0],
       [0, 0, 6],
       [0, 0, 4]])
In [44]: print(M1)
  (2, 2)    4
  (1, 2)    6
  (0, 0)    7

I'm using the coo format because it's easier to get the row/col values given the data idx .我正在使用coo格式,因为在给定数据idx的情况下更容易获取行/列值。 For csr/csc indices match with data , but the indptr values will be harder to recreate.对于csr/csc indicesdata匹配,但indptr值将更难重新创建。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM