简体   繁体   English

如何在 Python 的 SciPy 中删除稀疏矩阵中的小元素?

[英]How to delete small elements in sparse matrix in Python's SciPy?

I have a question that is quite similiar to Sean Laws example that you can find here: https://seanlaw.github.io/2019/02/27/set-values-in-sparse-matrix/我有一个与 Sean Laws 示例非常相似的问题,您可以在这里找到: https : //seanlaw.github.io/2019/02/27/set-values-in-sparse-matrix/

In my case, I want to delete all the elements in a sparse csr matrix, which have an absolute value smaller than some epsilon.就我而言,我想删除稀疏 csr 矩阵中的所有元素,这些元素的绝对值小于某个 epsilon。

First I tried something like首先我尝试了类似的东西

x[abs(x) < 3] = 0

but SciPy's warning about inefficiency lead me to Sean Laws explanation in the link above.但是 SciPy 关于效率低下的警告让我在上面的链接中找到了 Sean Laws 的解释。 I then tried manipulating his example code, but cannot find a solution to my problem.然后我尝试操作他的示例代码,但找不到解决我的问题的方法。

Here is the code, with some negative entries added.这是代码,添加了一些负面条目。 The example code would remove all negative entries as they are smaller than 3. I tried around with np.abs() and also with adding a second logical operator but did not succeed up to now.示例代码将删除所有小于 3 的负条目。我尝试使用 np.abs() 并添加第二个逻辑运算符,但到目前为止没有成功。

import numpy as np
from scipy.sparse import csr_matrix

x = csr_matrix(np.array([[1, 0.1, -2, 0, 3], 
                         [0, -4, -1, 5, 0]]))


nonzero_mask = np.array(x[x.nonzero()] < 3)[0]
rows = x.nonzero()[0][nonzero_mask]
cols = x.nonzero()[1][nonzero_mask]

x[rows, cols] = 0
print(x.todense())

gives

[[0. 0. 0. 0. 3.]
 [0. 0. 0. 5. 0.]]

But what I want is但我想要的是

[[0. 0. 0. 0. 3.]
 [0. -4. 0. 5. 0.]]

Any help is greatly appreciated, I feel like I am missing something very basic.非常感谢任何帮助,我觉得我错过了一些非常基本的东西。 Thank you in advance!先感谢您!

In [286]: from scipy import sparse                                              
In [287]: x = sparse.csr_matrix(np.array([[1, 0.1, -2, 0, 3],  
     ...:                          [0, -4, -1, 5, 0]])) 
     ...:  
     ...:    

Your test on x selects the 0 values as well, hence the efficiency warning.您对x测试也选择了 0 值,因此会出现效率警告。 But applied to just the nonzero values in the data attribute:但仅适用于data属性中的非零值:

In [288]: x.data                                                                
Out[288]: array([ 1. ,  0.1, -2. ,  3. , -4. , -1. ,  5. ])
In [289]: mask = np.abs(x.data)<3                                               
In [290]: mask                                                                  
Out[290]: array([ True,  True,  True, False, False,  True, False])
In [291]: x.data[mask]=0                                                        
In [292]: x.A                                                                   
Out[292]: 
array([[ 0.,  0.,  0.,  0.,  3.],
       [ 0., -4.,  0.,  5.,  0.]])

This doesn't actually remove the elements from the matrix, but there is a method for that cleanup:这实际上并没有从矩阵中删除元素,但有一种清理方法:

In [293]: x                                                                     
Out[293]: 
<2x5 sparse matrix of type '<class 'numpy.float64'>'
    with 7 stored elements in Compressed Sparse Row format>
In [294]: x.eliminate_zeros()                                                   
In [295]: x                                                                     
Out[295]: 
<2x5 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>

wrapping x[x.nonzero()] into np.abs() solves the problem:x[x.nonzero()]包装成np.abs()解决了这个问题:

>>> nonzero_mask = np.array(np.abs(x[x.nonzero()]) < 3)[0]
... 
>>> print(x.todense())                                                                                 
[[ 0.  0.  0.  0.  3.]
 [ 0. -4.  0.  5.  0.]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM