简体   繁体   English

SciPy稀疏矩阵(COO,CSR):清除行

[英]SciPy sparse matrix (COO,CSR): Clear row

For creating a scipy sparse matrix , I have an array or row and column indices I and J along with a data array V . 为了创建稀疏矩阵 ,我需要一个数组或行和列索引IJ以及数据数组V I use those to construct a matrix in COO format and then convert it to CSR , 我用它们来构造COO格式的矩阵,然后将其转换为CSR

matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()

I have a set of row indices for which the only entry should be a 1.0 on the diagonal. 我有一组行索引,唯一的输入应该是对角线上的1.0 So far, I go through I , find all indices that need wiping, and do just that: 到目前为止,我经历了I ,找到了需要擦除的所有索引,然后执行以下操作:

def find(lst, a):
    # From <http://stackoverflow.com/a/16685428/353337>
    return [i for i, x in enumerate(lst) if x in a]

# wipe_rows = [1, 55, 32, ...]  # something something

indices = find(I, wipe_rows)  # takes too long
I = numpy.delete(I, indices).tolist()
J = numpy.delete(J, indices).tolist()
V = numpy.delete(V, indices).tolist()

# Add entry 1.0 to the diagonal for each wipe row
I.extend(wipe_rows)
J.extend(wipe_rows)
V.extend(numpy.ones(len(wipe_rows)))

# construct matrix via coo

That works alright, but find tends to take a while. 没问题,但是find通常需要一段时间。

Any hints on how to speed this up? 关于如何加快速度的任何提示? (Perhaps wiping the rows in COO or CSR format is a better idea.) (也许以COO或CSR格式擦除行是一个更好的主意。)

If you intend to clear multiple rows at once, this 如果您打算一次清除多行,这

def _wipe_rows_csr(matrix, rows):
    assert isinstance(matrix, sparse.csr_matrix)

    # delete rows
    for i in rows:
        matrix.data[matrix.indptr[i]:matrix.indptr[i+1]] = 0.0

    # Set the diagonal
    d = matrix.diagonal()
    d[rows] = 1.0
    matrix.setdiag(d)

    return

is by far the fastest method. 是迄今为止最快的方法。 It doesn't really remove the lines, but sets all entries to zeros, then fiddles with the diagonal. 它并没有真正删除线条,而是将所有条目设置为零,然后用对角线摆弄。

If the entries are actually to be removed, one has to do some array manipulation. 如果实际上要删除这些条目,则必须进行一些数组操作。 This can be quite costly, but if speed is no issue: This 这可能会非常昂贵,但是如果速度没问题:

def _wipe_row_csr(A, i):
    '''Wipes a row of a matrix in CSR format and puts 1.0 on the diagonal.
    '''
    assert isinstance(A, sparse.csr_matrix)

    n = A.indptr[i+1] - A.indptr[i]

    assert n > 0

    A.data[A.indptr[i]+1:-n+1] = A.data[A.indptr[i+1]:]
    A.data[A.indptr[i]] = 1.0
    A.data = A.data[:-n+1]

    A.indices[A.indptr[i]+1:-n+1] = A.indices[A.indptr[i+1]:]
    A.indices[A.indptr[i]] = i
    A.indices = A.indices[:-n+1]

    A.indptr[i+1:] -= n-1

    return

replaces a given row i of the matrix matrix by the entry 1.0 on the diagonal. 用对角线上的项1.0替换矩阵matrix的给定行i

np.in1d should be a faster way of finding the indices : np.in1d应该是查找indices的更快方法:

In [322]: I   # from a np.arange(12).reshape(4,3) matrix
Out[322]: array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int32)

In [323]: indices=[i for i, x in enumerate(I) if x in [1,2]]

In [324]: indices
Out[324]: [2, 3, 4, 5, 6, 7]

In [325]: ind1=np.in1d(I,[1,2])

In [326]: ind1
Out[326]: 
array([False, False,  True,  True,  True,  True,  True,  True, False,
       False, False], dtype=bool)

In [327]: np.where(ind1)   # same as indices
Out[327]: (array([2, 3, 4, 5, 6, 7], dtype=int32),)

In [328]: I[~ind1]  # same as the delete
Out[328]: array([0, 0, 3, 3, 3], dtype=int32)

Direct manipulation of the coo inputs like this often a good way. 像这样直接操作coo输入通常是一个好方法。 But another is to take advantage of the csr math abilities. 但是另一个是利用csr数学能力。 You should be able to construct a diagonal matrix that zeros out the correct rows, and then adds the ones back in. 您应该能够构造一个对角矩阵,将正确的行清零,然后再将其重新添加。

Here's what I have in mind: 这就是我的想法:

In [357]: A=np.arange(16).reshape(4,4)
In [358]: M=sparse.coo_matrix(A)
In [359]: M.A
Out[359]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [360]: d1=sparse.diags([(1,0,0,1)],[0],(4,4))
In [361]: d2=sparse.diags([(0,1,1,0)],[0],(4,4))

In [362]: (d1*M+d2).A
Out[362]: 
array([[  0.,   1.,   2.,   3.],
       [  0.,   1.,   0.,   0.],
       [  0.,   0.,   1.,   0.],
       [ 12.,  13.,  14.,  15.]])

In [376]: x=np.ones((4,),bool);x[[1,2]]=False
In [378]: d1=sparse.diags([x],[0],(4,4),dtype=int)
In [379]: d2=sparse.diags([~x],[0],(4,4),dtype=int)

Doing this with lil format looks easy: 使用lil格式执行此操作看起来很容易:

In [593]: Ml=M.tolil()
In [594]: Ml.data[wipe]=[[1]]*len(wipe)
In [595]: Ml.rows[wipe]=[[i] for i in wipe]

In [596]: Ml.A
Out[596]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  0,  0],
       [ 0,  0,  1,  0],
       [12, 13, 14, 15]], dtype=int32)

It's sort of what you are doing with csr format, but it's easy to replace each row list with the appropriate [1] and [i] list. 这有点像您对csr格式所做的事情,但是很容易用适当的[1]和[i]列表替换每个行列表。 But conversion times ( tolil etc) can hurt run times. 但是转换时间( tolil等)可能会损害运行时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM