简体   繁体   English

是否有连接scipy.sparse矩阵的有效方法?

[英]Is there an efficient way of concatenating scipy.sparse matrices?

I'm working with some rather large sparse matrices (from 5000x5000 to 20000x20000) and need to find an efficient way to concatenate matrices in a flexible way in order to construct a stochastic matrix from separate parts. 我正在处理一些相当大的稀疏矩阵(从5000x5000到20000x20000),并且需要找到一种灵活的方式来连接矩阵的有效方法,以便从单独的部分构造随机矩阵。

Right now I'm using the following way to concatenate four matrices, but it's horribly inefficient. 现在,我正在使用以下方式来连接四个矩阵,但是效率非常低。 Is there any better way to do this that doesn't involve converting to a dense matrix? 有没有更好的方法可以做到这一点,而无需转换为密集矩阵?

rmat[0:m1.shape[0],0:m1.shape[1]] = m1
rmat[m1.shape[0]:rmat.shape[0],m1.shape[1]:rmat.shape[1]] = m2
rmat[0:m1.shape[0],m1.shape[1]:rmat.shape[1]] = bridge
rmat[m1.shape[0]:rmat.shape[0],0:m1.shape[1]] = bridge.transpose()

稀疏库现在具有hstackvstack ,分别用于水平和垂直连接矩阵。

Okay, I found the answer. 好吧,我找到了答案。 Using scipy.sparse.coo_matrix is much much faster than using lil_matrix. 使用scipy.sparse.coo_matrix比使用lil_matrix快得多。 I converted the matrices to coo (painless and fast) and then just concatenated the data, rows and columns after adding the right padding. 我将矩阵转换为coo(无痛且快速),然后在添加正确的填充后将数据,行和列连接在一起。

data = scipy.concatenate((m1S.data,bridgeS.data,bridgeTS.data,m2S.data))
rows = scipy.concatenate((m1S.row,bridgeS.row,bridgeTS.row + m1S.shape[0],m2S.row + m1S.shape[0]))
cols = scipy.concatenate((m1S.col,bridgeS.col+ m1S.shape[1],bridgeTS.col ,m2S.col + m1S.shape[1])) 

scipy.sparse.coo_matrix((data,(rows,cols)),shape=(m1S.shape[0]+m2S.shape[0],m1S.shape[1]+m2S.shape[1]) )

Amos's answer is no longer necessary. Amos的答案不再需要。 Scipy now does something similar to this internally if the input matrices are in csr or csc format and the desired output format is set to none or the same format as the input matrices. 如果输入矩阵采用csr或csc格式,并且所需的输出格式设置为无或与输入矩阵相同的格式,则Scipy现在在内部执行与此操作类似的操作。 It's efficient to vertically stack matrices in csr format, or to horizontally stack matrices in csc format, using scipy.sparse.vstack or scipy.sparse.hstack , respectively. 分别使用scipy.sparse.vstackscipy.sparse.hstack ,以csr格式垂直堆叠矩阵,或以csc格式水平堆叠矩阵是scipy.sparse.hstack

Using hstack, vstack, or concatenate, is dramatically slower than concatenating the inner data objects themselves. 使用hstack,vstack或串联比连接内部数据对象本身要慢得多。 The reason is that hstack/vstack converts the sparse matrix to coo format which can be very slow when the matrix is very large not and not in coo format. 原因是hstack / vstack将稀疏矩阵转换为coo格式,当矩阵很大而不不是coo格式时,它会很慢。 Here is the code for concatenating csc matrices, similar method can be used for csr matrices: 这是用于连接csc矩阵的代码,类似的方法可以用于csr矩阵:

def concatenate_csc_matrices_by_columns(matrix1, matrix2):
    new_data = np.concatenate((matrix1.data, matrix2.data))
    new_indices = np.concatenate((matrix1.indices, matrix2.indices))
    new_ind_ptr = matrix2.indptr + len(matrix1.data)
    new_ind_ptr = new_ind_ptr[1:]
    new_ind_ptr = np.concatenate((matrix1.indptr, new_ind_ptr))

    return csc_matrix((new_data, new_indices, new_ind_ptr))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM