简体   繁体   English

如何将整数值稀疏矩阵变换为0/1值稀疏矩阵,Python

[英]How to transform an integer value sparse matrix to 0/1 value sparse matrix, Python

I have a sparse matrix from the sklearn bag-of-words vectorizer. 我有一个来自sklearn bag-of-words vectorizer的稀疏矩阵。 It's a csr_matrix and its elements represent word frequency in a document. 它是一个csr_matrix,它的元素代表文档中的单词频率。 But now what I need is the 0/1 matrix where 1 represents the word exists in the document, so I don't care about the actual frequency. 但现在我需要的是0/1矩阵,其中1表示文档中存在的单词,所以我不关心实际频率。 Disregard the background problem, it's like this: I have a sparse matrix, 忽略背景问题,就像这样:我有一个稀疏矩阵,

2 3 4 0 0 0
0 0 0 0 0 8
0 0 0 2 0 0
0 0 0 0 0 0

I want all the nonzero elements to be 1, 我希望所有非零元素都是1,

1 1 1 0 0 0
0 0 0 0 0 1
0 0 0 1 0 0
0 0 0 0 0 0

How can I achieve this? 我怎样才能做到这一点? I assume using todense() and then loop is not a good choice since the sparse matrix is large. 我假设使用todense()然后循环不是一个好选择,因为稀疏矩阵很大。 Is there a better way? 有没有更好的办法?

Try csr_matrix.sign . 试试csr_matrix.sign it should be exactly what you need (although I didn't try it myself). 它应该是你需要的(虽然我自己没有尝试)。

I think you could just create a new matrix from the non-zero indices (see the scipy.sparse.csr_matrix reference). 我想你可以从非零索引创建一个新矩阵(参见scipy.sparse.csr_matrix参考)。 Assuming your sparse matrix is named sp_m: 假设您的稀疏矩阵名为sp_m:

sp_unit = csr_matrix( ([1]*len(sp_m.data), sp_m.nonzero()), shape=sp_m.shape )

OR 要么

As another user pointed out, you could use the sign function; 正如另一位用户指出的那样,你可以使用sign功能; however, I think you will need to square it if you have negative values: 但是,如果您有负值,我认为你需要对它进行调整:

sp_unit = sp_m.sign.multiply(sp_m.sign)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM