[英]SciPy sparse matrix (COO,CSR): Clear row
For creating a scipy sparse matrix , I have an array or row and column indices I
and J
along with a data array V
. 为了创建稀疏矩阵 ,我需要一个数组或行和列索引I
和J
以及数据数组V
I use those to construct a matrix in COO format and then convert it to CSR , 我用它们来构造COO格式的矩阵,然后将其转换为CSR ,
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()
I have a set of row indices for which the only entry should be a 1.0
on the diagonal. 我有一组行索引,唯一的输入应该是对角线上的1.0
。 So far, I go through I
, find all indices that need wiping, and do just that: 到目前为止,我经历了I
,找到了需要擦除的所有索引,然后执行以下操作:
def find(lst, a):
# From <http://stackoverflow.com/a/16685428/353337>
return [i for i, x in enumerate(lst) if x in a]
# wipe_rows = [1, 55, 32, ...] # something something
indices = find(I, wipe_rows) # takes too long
I = numpy.delete(I, indices).tolist()
J = numpy.delete(J, indices).tolist()
V = numpy.delete(V, indices).tolist()
# Add entry 1.0 to the diagonal for each wipe row
I.extend(wipe_rows)
J.extend(wipe_rows)
V.extend(numpy.ones(len(wipe_rows)))
# construct matrix via coo
That works alright, but find
tends to take a while. 没问题,但是find
通常需要一段时间。
Any hints on how to speed this up? 关于如何加快速度的任何提示? (Perhaps wiping the rows in COO or CSR format is a better idea.) (也许以COO或CSR格式擦除行是一个更好的主意。)
If you intend to clear multiple rows at once, this 如果您打算一次清除多行,这
def _wipe_rows_csr(matrix, rows):
assert isinstance(matrix, sparse.csr_matrix)
# delete rows
for i in rows:
matrix.data[matrix.indptr[i]:matrix.indptr[i+1]] = 0.0
# Set the diagonal
d = matrix.diagonal()
d[rows] = 1.0
matrix.setdiag(d)
return
is by far the fastest method. 是迄今为止最快的方法。 It doesn't really remove the lines, but sets all entries to zeros, then fiddles with the diagonal. 它并没有真正删除线条,而是将所有条目设置为零,然后用对角线摆弄。
If the entries are actually to be removed, one has to do some array manipulation. 如果实际上要删除这些条目,则必须进行一些数组操作。 This can be quite costly, but if speed is no issue: This 这可能会非常昂贵,但是如果速度没问题:
def _wipe_row_csr(A, i):
'''Wipes a row of a matrix in CSR format and puts 1.0 on the diagonal.
'''
assert isinstance(A, sparse.csr_matrix)
n = A.indptr[i+1] - A.indptr[i]
assert n > 0
A.data[A.indptr[i]+1:-n+1] = A.data[A.indptr[i+1]:]
A.data[A.indptr[i]] = 1.0
A.data = A.data[:-n+1]
A.indices[A.indptr[i]+1:-n+1] = A.indices[A.indptr[i+1]:]
A.indices[A.indptr[i]] = i
A.indices = A.indices[:-n+1]
A.indptr[i+1:] -= n-1
return
replaces a given row i
of the matrix matrix
by the entry 1.0
on the diagonal. 用对角线上的项1.0
替换矩阵matrix
的给定行i
。
np.in1d
should be a faster way of finding the indices
: np.in1d
应该是查找indices
的更快方法:
In [322]: I # from a np.arange(12).reshape(4,3) matrix
Out[322]: array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int32)
In [323]: indices=[i for i, x in enumerate(I) if x in [1,2]]
In [324]: indices
Out[324]: [2, 3, 4, 5, 6, 7]
In [325]: ind1=np.in1d(I,[1,2])
In [326]: ind1
Out[326]:
array([False, False, True, True, True, True, True, True, False,
False, False], dtype=bool)
In [327]: np.where(ind1) # same as indices
Out[327]: (array([2, 3, 4, 5, 6, 7], dtype=int32),)
In [328]: I[~ind1] # same as the delete
Out[328]: array([0, 0, 3, 3, 3], dtype=int32)
Direct manipulation of the coo
inputs like this often a good way. 像这样直接操作coo
输入通常是一个好方法。 But another is to take advantage of the csr
math abilities. 但是另一个是利用csr
数学能力。 You should be able to construct a diagonal matrix that zeros out the correct rows, and then adds the ones back in. 您应该能够构造一个对角矩阵,将正确的行清零,然后再将其重新添加。
Here's what I have in mind: 这就是我的想法:
In [357]: A=np.arange(16).reshape(4,4)
In [358]: M=sparse.coo_matrix(A)
In [359]: M.A
Out[359]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [360]: d1=sparse.diags([(1,0,0,1)],[0],(4,4))
In [361]: d2=sparse.diags([(0,1,1,0)],[0],(4,4))
In [362]: (d1*M+d2).A
Out[362]:
array([[ 0., 1., 2., 3.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 12., 13., 14., 15.]])
In [376]: x=np.ones((4,),bool);x[[1,2]]=False
In [378]: d1=sparse.diags([x],[0],(4,4),dtype=int)
In [379]: d2=sparse.diags([~x],[0],(4,4),dtype=int)
Doing this with lil
format looks easy: 使用lil
格式执行此操作看起来很容易:
In [593]: Ml=M.tolil()
In [594]: Ml.data[wipe]=[[1]]*len(wipe)
In [595]: Ml.rows[wipe]=[[i] for i in wipe]
In [596]: Ml.A
Out[596]:
array([[ 0, 1, 2, 3],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[12, 13, 14, 15]], dtype=int32)
It's sort of what you are doing with csr
format, but it's easy to replace each row list with the appropriate [1] and [i] list. 这有点像您对csr
格式所做的事情,但是很容易用适当的[1]和[i]列表替换每个行列表。 But conversion times ( tolil
etc) can hurt run times. 但是转换时间( tolil
等)可能会损害运行时间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.