Performance of scipy.weave.inline

Question

I am a Python novice who is trying to learn a bit about this fantastic programming language. I have tried using scipy.weave.inline to speed up some computation. Just to learn a bit, I tried to implement a matrix multiplication using scipy.weave.inline. I have not included any error handling - just trying it out to better understand it. The code is as follows:

import scipy.weave
def cmatmul(A,B):
    R = numpy.zeros((A.shape[0],B.shape[1]))
    M = R.shape[0]
    N = R.shape[1]
    K = A.shape[1]

    code = \
    """
    for (int i=0; i<M; i++)
        for (int j=0; j<N; j++)
            for (int k=0; k<K; k++)
                R(i,j) += A(i,k) * B(k,j);
    """
    scipy.weave.inline(code, ['R','A','B','M','N','K'], \
                       type_converters=scipy.weave.converters.blitz, \
                       compiler='gcc')
    return R

When I compare with numpy.dot, I experience that the weave.inline version takes roughly 50x the time as numpy.dot. I know that numpy is very fast when it can be applied. The difference is even seen for large matrices such as size 1000 x 1000.

I have checked both numpy.dot and scipy.weave.inline and both appear to use one core 100% when computing. Numpy.dot delivers 10.0 GFlops compared to the theoretical 11.6 GFlops of my laptop (double precision). In single precision I measure the double performance as expected. But the scipy.weave.inline is way behind. 1/50 times this performance for scipy.weave.inline.

Is this difference to be expected? Or what am I doing wrong?

Answer 1

You implemented a naive matrix multiplication algorithm, which scipy.weave compiles to fast machine code.

However, there are non-obvious, more CPU cache efficient algorithms for matrix multiplication (which usually split the matrix into blocks and deal with those), and additional speed can be gained with CPU-specific optimizations. Numpy by default uses an optimized BLAS library for this operation, if you have one installed. These libraries will likely be fast compared to anything you can code up yourself without doing an amount of research.

Performance of scipy.weave.inline

Question

1 answers

solution1
7 2011-10-23 14:13:34

Performance of scipy.weave.inline

Question

1 answers

solution1 7 2011-10-23 14:13:34

solution1
7 2011-10-23 14:13:34