优化Scipy稀疏矩阵

Question

I have a sparse matrix where I'm currently enumerating over each row and performing some calculations based on the information from each row. 我有一个稀疏矩阵，目前我正在对每一行进行枚举，并根据每一行的信息执行一些计算。 Each row is completely independent of the others. 每行完全独立于其他行。 However, for large matrices, this code is extremely slow (takes about 2 hours) and I can't convert the matrix to a dense one either (limited to 8GB RAM). 但是，对于大型矩阵，此代码非常慢（大约需要2个小时），而且我也无法将矩阵转换为密集矩阵（限于8GB RAM）。

import scipy.sparse
import numpy as np

def process_row(a, b):
    """
    a - contains the row indices for a sparse matrix
    b - contains the column indices for a sparse matrix

    Returns a new vector of length(a)
    """

    return

def assess(mat):
    """
    """
    mat_csr = mat.tocsr()
    nrows, ncols = mat_csr.shape
    a = np.arange(ncols, dtype=np.int32)
    b = np.empty(ncols, dtype=np.int32)
    result = []

    for i, row in enumerate(mat_csr):
        # Process one row at a time
        b.fill(i)
        result.append(process_row(b, a))

    return result

if __name__ == '__main__':
    row  = np.array([8,2,7,4])
    col  = np.array([1,3,2,1])
    data = np.array([1,1,1,1])

    mat = scipy.sparse.coo_matrix((data, (row, col)))
    print assess(mat)

I am looking to see if there's any way to design this better so that it performs much faster. 我正在寻找是否有任何方法可以对此进行更好的设计，以使其执行得更快。 Essentially, the process_row function takes (row, col) index pairs (from a, b) and does some math using another sparse matrix and returns a result. 本质上， process_row函数采用（行，列）索引对（来自a，b），并使用另一个稀疏矩阵进行一些数学运算并返回结果。 I don't have the option to change this function but it can actually process different row/col pairs and is not restricted to processing everything from the same row. 我没有选择更改此功能的选项，但它实际上可以处理不同的行/列对，并且不限于处理同一行中的所有内容。

Answer 1

Your problem looks similar to this other recent SO question: 您的问题看起来与此最近的另一个SO问题类似：

Calculate the euclidean distance in scipy csr matrix 计算scipy csr矩阵中的欧式距离

In my answer I sketched a way of iterating over the rows of a sparse matrix. 在我的回答中，我画出了一种在稀疏矩阵的行上进行迭代的方法。 I think it is faster to convert the array to lil , and construct the dense rows directly from its sublists. 我认为将数组转换为lil并直接从其子列表构造密集行的速度更快。 This avoids the overhead of creating a new sparse matrix for each row. 这避免了为每一行创建新的稀疏矩阵的开销。 But I haven't done time tests. 但是我还没有做时间测试。

https://stackoverflow.com/a/36559702/901925 https://stackoverflow.com/a/36559702/901925

Maybe this applies to your case. 也许这适用于您的情况。

优化Scipy稀疏矩阵

问题描述

1 个解决方案

解决方案1
0 2016-04-12 19:20:58

优化Scipy稀疏矩阵

问题描述

1 个解决方案

解决方案1 0 2016-04-12 19:20:58

解决方案1
0 2016-04-12 19:20:58