[英]How to transform an integer value sparse matrix to 0/1 value sparse matrix, Python
I have a sparse matrix from the sklearn bag-of-words vectorizer. 我有一个来自sklearn bag-of-words vectorizer的稀疏矩阵。 It's a csr_matrix and its elements represent word frequency in a document.
它是一个csr_matrix,它的元素代表文档中的单词频率。 But now what I need is the 0/1 matrix where 1 represents the word exists in the document, so I don't care about the actual frequency.
但现在我需要的是0/1矩阵,其中1表示文档中存在的单词,所以我不关心实际频率。 Disregard the background problem, it's like this: I have a sparse matrix,
忽略背景问题,就像这样:我有一个稀疏矩阵,
2 3 4 0 0 0
0 0 0 0 0 8
0 0 0 2 0 0
0 0 0 0 0 0
I want all the nonzero elements to be 1, 我希望所有非零元素都是1,
1 1 1 0 0 0
0 0 0 0 0 1
0 0 0 1 0 0
0 0 0 0 0 0
How can I achieve this? 我怎样才能做到这一点? I assume using todense() and then loop is not a good choice since the sparse matrix is large.
我假设使用todense()然后循环不是一个好选择,因为稀疏矩阵很大。 Is there a better way?
有没有更好的办法?
Try csr_matrix.sign . 试试csr_matrix.sign 。 it should be exactly what you need (although I didn't try it myself).
它应该是你需要的(虽然我自己没有尝试)。
I think you could just create a new matrix from the non-zero indices (see the scipy.sparse.csr_matrix reference). 我想你可以从非零索引创建一个新矩阵(参见scipy.sparse.csr_matrix参考)。 Assuming your sparse matrix is named sp_m:
假设您的稀疏矩阵名为sp_m:
sp_unit = csr_matrix( ([1]*len(sp_m.data), sp_m.nonzero()), shape=sp_m.shape )
OR 要么
As another user pointed out, you could use the sign function; 正如另一位用户指出的那样,你可以使用sign功能; however, I think you will need to square it if you have negative values:
但是,如果您有负值,我认为你需要对它进行调整:
sp_unit = sp_m.sign.multiply(sp_m.sign)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.