简体   繁体   English

将满足特定条件的scipy.sparse矩阵行设置为零

[英]Set rows of scipy.sparse matrix that meet certain condition to zeros

I wonder what is the best way to replaces rows that do not satisfy a certain condition with zeros for sparse matrices. 我想知道用稀疏矩阵替换不满足某个条件的行的最佳方法是什么。 For example (I use plain arrays for illustration): 例如(我使用普通数组进行说明):

I want to replace every row whose sum is greater than 10 with a row of zeros 我想用一行零替换总和大于10的每一行

a = np.array([[0,0,0,1,1],
              [1,2,0,0,0],
              [6,7,4,1,0],  # sum > 10
              [0,1,1,0,1],
              [7,3,2,2,8],  # sum > 10 
              [0,1,0,1,2]])

I want to replace a[2] and a[4] with zeros, so my output should look like this: 我想用零替换[2]和[4],所以我的输出应该如下所示:

array([[0, 0, 0, 1, 1],
       [1, 2, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 1, 1, 0, 1],
       [0, 0, 0, 0, 0],
       [0, 1, 0, 1, 2]])

This is fairly straight forward for dense matrices: 这对于密集矩阵来说非常简单:

row_sum = a.sum(axis=1)
to_keep = row_sum >= 10   
a[to_keep] = np.zeros(a.shape[1]) 

However, when I try: 但是,当我尝试:

s = sparse.csr_matrix(a) 
s[to_keep, :] = np.zeros(a.shape[1])

I get this error: 我收到此错误:

    raise NotImplementedError("Fancy indexing in assignment not "
NotImplementedError: Fancy indexing in assignment not supported for csr matrices.

Hence, I need a different solution for sparse matrices. 因此,我需要一个不同的稀疏矩阵解决方案。 I came up with this: 我想出了这个:

def zero_out_unfit_rows(s_mat, limit_row_sum):
    row_sum = s_mat.sum(axis=1).T.A[0]
    to_keep = row_sum <= limit_row_sum
    to_keep = to_keep.astype('int8')
    temp_diag = get_sparse_diag_mat(to_keep)
    return temp_diag * s_mat

def get_sparse_diag_mat(my_diag):
    N = len(my_diag)
    my_diags = my_diag[np.newaxis, :]
    return sparse.dia_matrix((my_diags, [0]), shape=(N,N))

This relies on the fact that if we set 2nd and 4th elements of the diagonal in the identity matrix to zero, then rows of the pre-multiplied matrix are set to zero. 这依赖于以下事实:如果我们将单位矩阵中对角线的第2和第4个元素设置为零,则将预乘矩阵的行设置为零。

However, I feel that there is a better, more scipynic, solution. 但是,我觉得有更好的,更多的scipynic解决方案。 Is there a better solution? 有更好的解决方案吗?

Not sure if it is very scithonic , but a lot of the operations on sparse matrices are better done by accessing the guts directly. 不确定它是否非常scithonic ,但是通过直接访问guts可以更好地完成稀疏矩阵上的大量操作。 For your case, I personally would do: 对于你的情况,我个人会这样做:

a = np.array([[0,0,0,1,1],
              [1,2,0,0,0],
              [6,7,4,1,0],  # sum > 10
              [0,1,1,0,1],
              [7,3,2,2,8],  # sum > 10 
              [0,1,0,1,2]])
sps_a = sps.csr_matrix(a)

# get sum of each row:
row_sum = np.add.reduceat(sps_a.data, sps_a.indptr[:-1])

# set values to zero
row_mask = row_sum > 10
nnz_per_row = np.diff(sps_a.indptr)
sps_a.data[np.repeat(row_mask, nnz_per_row)] = 0
# ask scipy.sparse to remove the zeroed entries
sps_a.eliminate_zeros()

>>> sps_a.toarray()
array([[0, 0, 0, 1, 1],
       [1, 2, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 1, 1, 0, 1],
       [0, 0, 0, 0, 0],
       [0, 1, 0, 1, 2]])
>>> sps_a.nnz # it does remove the entries, not simply set them to zero
10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM