简体   繁体   English

在稀疏对称矩阵上执行SVD时python内核死

[英]python kernel dead when performing SVD on a sparse symmetrical matrix

I would like to reproduce the SVD method mentioned in a standford lecture on my own dataset. 我想在自己的数据集上重现Standford讲座中提到的SVD方法。 The slide of the lecture is as following 演讲的幻灯片如下

斯坦福大学演讲

My dataset is of the same type, which is a word co-occurrence matrix M with a size of 我的数据集属于同一类型,即字共现矩阵M,大小为

<13840x13840 sparse matrix of type '<type 'numpy.int64'>' 
with 597828 stored elements in Compressed Sparse Column format>

generated and processed from CountVectorizer(), note that this is a symmetric matrix. 从CountVectorizer()生成并处理,请注意,这是一个对称矩阵。

However, when I tried to extract features from SVD, however, none of the following code works, 但是,当我尝试从SVD提取功能时,以下代码均无效,

1st try: 第一次尝试:

scipy.linalg.svd(M)

I have tried the matrix from sparse csr todense() and toarray(), my computer taken quite a few minutes, and it displays kernel stops. 我已经尝试了稀疏csr todense()和toarray()的矩阵,我的计算机花了相当多的时间,并且它显示内核停止。 I also played around with different parameter settings 我也玩了不同的参数设置

2nd try: 第二次尝试:

scipy.sparse.linalg.svds(M)

I have also tried to change the matrix type from int64 to float64, however, the kernel dead after 30 seconds or so. 我还尝试将矩阵类型从int64更改为float64,但是,内核在30秒左右后就死了。

Anyone could suggest me a way to conduct SVD on this matrix in any way? 有人可以建议我以任何方式在此矩阵上执行SVD吗?

Thank you so much 非常感谢

Seems that the matrix is to stressful for the memory. 似乎矩阵对内存压力很大。 You have several options: 您有几种选择:

  1. Perform an adaptive SVD, 执行自适应SVD
  2. Use modred , 使用modred
  3. Use the SVD from dask . 使用dask的SVD。

The latter two should work out of the box. 后两个应该开箱即用。 All these options will load only what the memory can. 所有这些选项将仅加载内存可以加载的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM