scipy.weave.inline的性能

Question

I am a Python novice who is trying to learn a bit about this fantastic programming language. 我是一位Python新手，正在尝试学习一些有关这种出色的编程语言的知识。 I have tried using scipy.weave.inline to speed up some computation. 我尝试使用scipy.weave.inline来加快一些计算。 Just to learn a bit, I tried to implement a matrix multiplication using scipy.weave.inline. 只是为了学习一点，我尝试使用scipy.weave.inline实现矩阵乘法。 I have not included any error handling - just trying it out to better understand it. 我没有包括任何错误处理-只是尝试以更好地理解它。 The code is as follows: 代码如下：

import scipy.weave
def cmatmul(A,B):
    R = numpy.zeros((A.shape[0],B.shape[1]))
    M = R.shape[0]
    N = R.shape[1]
    K = A.shape[1]

    code = \
    """
    for (int i=0; i<M; i++)
        for (int j=0; j<N; j++)
            for (int k=0; k<K; k++)
                R(i,j) += A(i,k) * B(k,j);
    """
    scipy.weave.inline(code, ['R','A','B','M','N','K'], \
                       type_converters=scipy.weave.converters.blitz, \
                       compiler='gcc')
    return R

When I compare with numpy.dot, I experience that the weave.inline version takes roughly 50x the time as numpy.dot. 当我与numpy.dot进行比较时，我发现weave.inline版本花费的时间大约是numpy.dot的50倍。 I know that numpy is very fast when it can be applied. 我知道numpy可以应用时非常快。 The difference is even seen for large matrices such as size 1000 x 1000. 对于大型矩阵（例如1000 x 1000），甚至可以看到这种差异。

I have checked both numpy.dot and scipy.weave.inline and both appear to use one core 100% when computing. 我已经检查了numpy.dot和scipy.weave.inline，在计算时似乎都使用了一个核心100％。 Numpy.dot delivers 10.0 GFlops compared to the theoretical 11.6 GFlops of my laptop (double precision). Numpy.dot提供了10.0 GFlop，而我的笔记本电脑的理论值为11.6 GFlop（双精度）。 In single precision I measure the double performance as expected. 我以单精度测量了预期的双重性能。 But the scipy.weave.inline is way behind. 但是scipy.weave.inline落后了。 1/50 times this performance for scipy.weave.inline. scipy.weave.inline的性能的1/50倍。

Is this difference to be expected? 这种差异是可以预期的吗？ Or what am I doing wrong? 或者我做错了什么？

Answer 1

You implemented a naive matrix multiplication algorithm, which scipy.weave compiles to fast machine code. 您实现了一个朴素的矩阵乘法算法，该算法将scipy.weave编译为快速的机器代码。

However, there are non-obvious, more CPU cache efficient algorithms for matrix multiplication (which usually split the matrix into blocks and deal with those), and additional speed can be gained with CPU-specific optimizations. 但是，存在用于矩阵乘法的非显而易见的，CPU缓存效率更高的算法（通常将矩阵拆分为块并进行处理），并且可以通过特定于CPU的优化来获得更高的速度。 Numpy by default uses an optimized BLAS library for this operation, if you have one installed. 如果已安装Numpy，默认情况下将使用优化的BLAS库进行此操作。 These libraries will likely be fast compared to anything you can code up yourself without doing an amount of research. 与无需进行大量研究即可自行编写的代码库相比，这些库可能会更快。

scipy.weave.inline的性能

问题描述

1 个解决方案

解决方案1
7 2011-10-23 14:13:34

scipy.weave.inline的性能

问题描述

1 个解决方案

解决方案1 7 2011-10-23 14:13:34

解决方案1
7 2011-10-23 14:13:34