简体   繁体   English

如何使用 Python (可能还有 Scipy)估计一个巨大的、稀疏的 csr_matrix 的等级?

[英]How to estimate the rank of a huge, sparse csr_matrix using Python (and probably Scipy)?

I have a huge, sparse matrix in the type of scipy.sparse.csr.csr_matrix that I need to estimate its rank.我有一个scipy.sparse.csr.csr_matrix类型的巨大稀疏矩阵,我需要估计它的等级。 I find this on scipy.org that seems perfect for this job, but it doesn't support csr_matrix .在 scipy.org 上找到了这个似乎非常适合这项工作,但它不支持csr_matrix

from scipy.sparse import load_npz
from scipy.linalg.interpolative import estimate_rank

X = load_npz("https://drive.google.com/uc?export=download&id=1SSR6JWEqG4DXRU9qo78682D9pGJF3Wr0")
print("Rank:", estimate_rank(X, eps=100))

TypeError: invalid input type (must be array or LinearOperator) TypeError:无效的输入类型(必须是数组或LinearOperator)

The sparse matrix has over 50K rows and nearly 40K columns.稀疏矩阵有超过 50K 行和近 40K 列。 Converting it to a numpy array first seems pointless.首先将其转换为 numpy 数组似乎毫无意义。 What should I do to make it work?我应该怎么做才能让它工作?


The following doesn't work either.以下也不起作用。

from scipy.sparse import load_npz, linalg
from scipy.linalg.interpolative import estimate_rank

X = load_npz("https://drive.google.com/uc?export=download&id=1SSR6JWEqG4DXRU9qo78682D9pGJF3Wr0")
print("Rank:", estimate_rank(linag.aslinearoperator(X), eps=100))

在此处输入图像描述 ValueError Traceback (most recent call last) in () 3 4 print(type(X)) ----> 5 print("Rank of the Document-Term Matrix:", estimate_rank(aslinearoperator(X), eps=1)) ValueError Traceback (most recent call last) in () 3 4 print(type(X)) ----> 5 print("文档-词矩阵的等级:",estimate_rank(aslinearoperator(X), eps=1) )

1 frames /usr/local/lib/python3.6/dist-packages/scipy/linalg/_interpolative_backend.py in idd_findrank(eps, m, n, matvect) 659:rtype: int 660 """ --> 661 k, ra, ier = _id.idd_findrank(eps, m, n, matvect) 662 if ier: 663 raise _RETCODE_ERROR 1 帧 /usr/local/lib/python3.6/dist-packages/scipy/linalg/_interpolative_backend.py 在 idd_findrank(eps, m, n, matvect) 659:rtype: int 660 """ --> 661 k, ra, ier = _id.idd_findrank(eps, m, n, matvect) 662 if ier: 663 raise _RETCODE_ERROR

ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (-1216667648,) ValueError: failed to create intent(cache|hide)|optional array-- 必须有定义的维度但是得到 (-1216667648,)

I have used sparse, but haven't used estimate_rank .我用过 sparse,但没有用过estimate_rank But I can read errors and docs.但我可以阅读错误和文档。

In [23]: from scipy import sparse                                                                      
In [24]: from scipy.sparse import linalg                                                               
In [25]: M = sparse.random(100,100,.2, 'csr')   

In [36]: inter.estimate_rank(M,.001)                                                                   
---------------------------------------------------------------------------
...
TypeError: invalid input type (must be array or LinearOperator)

testing the array option:测试数组选项:

In [37]: inter.estimate_rank(M.A,.1)                                                                   
Out[37]: 100

testing the linearoperator option:测试线性运算符选项:

In [38]: from scipy.sparse import linalg                                                               
In [39]: L = linalg.aslinearoperator(M)                                                                
In [40]: L                                                                                             
Out[40]: <100x100 MatrixLinearOperator with dtype=float64>
In [41]: inter.estimate_rank(L,.001)                                                                   
Out[41]: 99

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM