简体   繁体   English

SciPy:将1/0稀疏矩阵转换为0/1稀疏矩阵

[英]SciPy: converting 1/0 sparse matrix to 0/1 sparse matrix

What is the fastest way to convert 1/0 sparse matrix to 0/1 sparse matrix without using todense() method? 在不使用todense()方法的情况下,将1/0稀疏矩阵转换为0/1稀疏矩阵的最快方法是什么?

Example: 例:

Source matrix looks like: 源矩阵看起来像:

matrix([[1, 1, 0, 0, 0, 0, 0, 0, 1, 1],
        [1, 1, 0, 0, 1, 1, 1, 1, 0, 1],
        [0, 0, 1, 1, 1, 0, 1, 0, 0, 1],
        [1, 0, 0, 1, 1, 1, 0, 1, 0, 0],
        [1, 1, 0, 0, 1, 1, 0, 0, 0, 0]])

Result matrix is: 结果矩阵是:

matrix([[0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
        [0, 0, 1, 1, 0, 0, 0, 0, 1, 0],
        [1, 1, 0, 0, 0, 1, 0, 1, 1, 0],
        [0, 1, 1, 0, 0, 0, 1, 0, 1, 1],
        [0, 0, 1, 1, 0, 0, 1, 1, 1, 1]])

The source matrix is too large, so I can use only sparse representation of matrixes. 源矩阵太大,所以我只能使用矩阵的稀疏表示。

DSM is correct. DSM是正确的。 There are many representations of sparse matrices, but if you use dictionary format, then you need 3 numbers to represent one element (row, col, value). 稀疏矩阵有许多表示,但如果使用字典格式,则需要3个数字来表示一个元素(行,列,值)。 Thus you need 3*np memory ( np is the number of nonzeros). 因此你需要3*np内存( np是非零数)。 If use a dense format, then you need n*m memory. 如果使用密集格式,那么你需要n*m内存。 Therefore, sparse representation is useful only when np/(n*m) < 3 for this case, which means the sparsity is less than 1/3 . 因此,稀疏表示仅在np/(n*m) < 3时才有用,这意味着稀疏度小于1/3

On the other hand, if you flip your 1 s and 0 s, then the sparsity will be one minus the original one. 另一方面,如果你翻转你的1秒和0秒,那么稀疏性将是一个减去原始的稀疏度。 Thus, if the original matrix is sparse, then there is no way that your flipped matrix is sparse. 因此,如果原始矩阵是稀疏的,那么您的翻转矩阵就不可能是稀疏的。

If you only need 1 s and 0 s in your matrix then I would recommend writing your own representation of the compressed sparse matrix. 如果在矩阵中只需要1 s和0 s,那么我建议编写自己的压缩稀疏矩阵表示。 For example, you can read your matrix from top-left, row-wise, and if there are any consecutive 1 s or 0 s, then you can do something like 1 3 0 2 1 0 1 4 , which means "three consecutive 1 s, two consecutive 0 s, 1 , 0 , four consecutive 1 s". 例如,您可以从左上角,逐行读取矩阵,如果有任何连续的1秒或0秒,那么您可以执行类似1 3 0 2 1 0 1 4 ,这意味着“连续三个1 S,两个连续的0 S, 10 ,四个连续的1的”。 Depending on your use of your matrix, my suggestion may be useless, but it is worth thinking about it. 根据您对矩阵的使用情况,我的建议可能毫无用处,但值得考虑一下。

Sorry to spam, but on the second thought, if the matrix only has 1s and 0s, then you can use one int32 number to represent 32 elements (matrix needs to be dense). 很抱歉垃圾邮件,但第二个想法,如果矩阵只有1和0,那么你可以使用一个int32数字来表示32个元素(矩阵需要密集)。 Then flipping 1s and 0s is just a bit manipulation and shouldn't be hard. 然后翻转1和0只是一点点操作,不应该很难。 This will make the size of matrix to 1/32 and also the operation should be roughly 32 times faster. 这将使矩阵的大小为1/32,并且操作应该大约快32倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM