矩阵（scipy稀疏） - 矩阵（密集; numpy数组）乘法效率

Question

I am a researcher working on geophysical inversion. 我是一名从事地球物理反演的研究员。 Which can requires solving linear system: Au = rhs . 这可能需要求解线性系统： Au = rhs 。 Here A is often sparse matrix, but rhs and u can are either dense matrix or vector. 这里A通常是稀疏矩阵，但是rhs和u可以是密集矩阵或向量。 To proceed gradient-based inversion, we need sensitivity computation, and it requires a number of matrix-matrix and matrix-vector multiplication. 为了进行基于梯度的反演，我们需要灵敏度计算，并且它需要许多矩阵 - 矩阵和矩阵 - 向量乘法。 Recently I have found a weird behaviour in matrix (sparse) - matrix (dense) multiplication, and below is an example: 最近我在矩阵（稀疏） - 矩阵（密集）乘法中发现了一种奇怪的行为，下面是一个例子：

import numpy as np
import scipy.sparse as sp
n = int(1e6)
m = int(100)
e = np.ones(n)
A = sp.spdiags(np.vstack((e, e, e)), np.array([-1, 0, 1]), n, n)
A = A.tocsr()
u = np.random.randn(n,m)

%timeit rhs = A*u[:,0]
#10 loops, best of 3: 22 ms per loop    
%timeit rhs = A*u[:,:10]
#10 loops, best of 3: 98.4 ms per loop
%timeit rhs = A*u
#1 loop, best of 3: 570 ms per loop

I was expecting almost linear increase in compution time when I am increasing the size of dense matrix u multiplied by sparse matrix A (eg the second one A*u[:,:10] supposed to me 220 ms and the final one A*u[:,:10] 2.2s). 当我增加密集矩阵u大小乘以稀疏矩阵A （例如第二个A*u[:,:10]应该是220毫秒，最后一个A*u[:,:10]时，我期待计算时间几乎线性增加。 A*u[:,:10] 2.2s）。 However, it is much faster than I expected. 但是，它比我预期的要快得多。 Reversely, Matrix-vector multiplication is much slower than Matrix-Matrix multiplication. 相反，矩阵向量乘法比矩阵 - 矩阵乘法慢得多。 Can someone explain why? 有人可以解释原因吗？ Further, is there an effective way to boost up Matrix-vector multiplication similar level of efficiency to Matrix-Matrix multiplication? 此外，是否有一种有效的方法来提升Matrix-vector乘法效率与Matrix-Matrix乘法相似的效率水平？

Answer 1

If you look at the source code , you can see that csr_matvec (which implements matrix-vector multiplication) is implemented as a straightforward sum loop in C code, while csr_matvecs (which implements matrix-matrix multiplication) is implemented as a call to the axpy BLAS routine. 如果查看源代码，可以看到csr_matvec （实现矩阵向量乘法）在C代码中实现为简单的求和循环，而csr_matvecs （实现矩阵 - 矩阵乘法）实现为对axpy的调用BLAS常规。 Depending on what BLAS library your installation is linked to, such a call can be far more efficient than the straightforward C implementation used for matrix-vector multiplication. 根据什么BLAS库安装链接到，这样的电话可以远远超过用于矩阵向量相乘的简单的C实现更加高效。 That's likely why you're seeing matrix-vector multiplication be so slow. 这可能就是为什么你看到矩阵向量乘法如此缓慢。

Changing scipy so that it calls BLAS in the matrix-vector case could be a useful contribution to the package. 改变scipy以便在矩阵向量的情况下调用BLAS可能对包有用。

矩阵（scipy稀疏） - 矩阵（密集; numpy数组）乘法效率

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-04-21 16:03:39

矩阵（scipy稀疏） - 矩阵（密集; numpy数组）乘法效率

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-04-21 16:03:39

解决方案1
2 已采纳 2017-04-21 16:03:39