Cython比纯Python快或慢

Question

I am using several techniques ( NumPy , Weave and Cython ) to perform a Python performance benchmark. 我正在使用几种技术（ NumPy ， Weave和Cython ）执行Python性能基准测试。 What the code basically does mathematically is C = AB , where A, B and C are N x N matrices ( NOTE: this is a matrix product and not an element-wise multiplication). 代码基本上在数学上所做的是C = AB ，其中A，B和C是N x N矩阵（ 注意：这是矩阵乘积，而不是逐元素乘法）。

I have written 5 distinct implementations of the code: 我已经编写了5种不同的代码实现：

Pure python (Loop over 2D Python lists) 纯Python（在2D Python列表中循环）
NumPy (Dot product of 2D NumPy arrays) NumPy（二维NumPy阵列的点积）
Weave inline (C++ loop over 2D arrays) 内联编织（C ++遍历2D数组）
Cython (Loop over 2D Python lists + static typing) Cython（在2D Python列表上循环+静态键入）
Cython-Numpy (Loop over 2D NumPy arrays + static typing) Cython-Numpy（在2D NumPy数组上循环+静态键入）

My expectation is that implementations 2 through 5 will be substantially faster than implementation 1. My results however indicate otherwise. 我的期望是实施2到5将比实施1快得多。但是我的结果却相反。 These are my normalised speed-up results relative to the pure Python implementation: 这些是我相对于纯Python实现的标准化提速结果：

python_list: 1.00 python_list：1.00
numpy_array: 330.09 numpy_array：330.09
weave_inline: 30.72 weave_inline：30.72
cython_list: 2.80 cython_list：2.80
cython_array: 0.14 cython_array：0.14

I am quite happy with the performance of NumPy, however I am less enthusiastic about Weave's performance and Cython's performance makes me cry. 我对NumPy的表现感到非常满意，但是我对Weave的表现并不热心，而Cython的表现使我哭泣。 My entire code is separated over two files. 我的整个代码分为两个文件。 Everything is automated and you simply need to run the first file to see all results. 一切都是自动化的，您只需要运行第一个文件即可查看所有结果。 Could someone please aid me by indicating what I could do to obtain better results? 有人可以帮我指出我可以做些什么以获得更好的结果吗？

matmul.py: matmul.py：

import time

import numpy as np
from scipy import weave
from scipy.weave import converters

import pyximport
pyximport.install()
import cython_matmul as cml


def python_list_matmul(A, B):
    C = np.zeros(A.shape, dtype=float).tolist()
    A = A.tolist()
    B = B.tolist()
    for k in xrange(len(A)):
        for i in xrange(len(A)):
            for j in xrange(len(A)):
                C[i][k] += A[i][j] * B[j][k]
    return C


def numpy_array_matmul(A, B):
    return np.dot(A, B)


def weave_inline_matmul(A, B):
    code = """
       int i, j, k;
       for (k = 0; k < N; ++k)
       {
           for (i = 0; i < N; ++i)
           {
               for (j = 0; j < N; ++j)
               {
                   C(i, k) += A(i, j) * B(j, k);
               }
           }
       }
       """

    C = np.zeros(A.shape, dtype=float)
    weave.inline(code, ['A', 'B', 'C', 'N'], type_converters=converters.blitz, compiler='gcc')
    return C


N = 100
A = np.random.rand(N, N)
B = np.random.rand(N, N)

function = []
function.append([python_list_matmul, 'python_list'])
function.append([numpy_array_matmul, 'numpy_array'])
function.append([weave_inline_matmul, 'weave_inline'])
function.append([cml.cython_list_matmul, 'cython_list'])
function.append([cml.cython_array_matmul, 'cython_array'])

t = []
for i in xrange(len(function)):
    t1 = time.time()
    C = function[i][0](A, B)
    t2 = time.time()
    t.append(t2 - t1)
    print function[i][1] + ' \t: ' + '{:10.6f}'.format(t[0] / t[-1])

cython_matmul.pyx: cython_matmul.pyx：

import numpy as np
cimport numpy as np

import cython
cimport cython

DTYPE = np.float
ctypedef np.float_t DTYPE_t


@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cpdef cython_list_matmul(A, B):

    cdef int i, j, k
    cdef int N = len(A)

    A = A.tolist()
    B = B.tolist()
    C = np.zeros([N, N]).tolist()

    for k in xrange(N):
        for i in xrange(N):
            for j in xrange(N):
                C[i][k] += A[i][j] * B[j][k]
    return C


@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cpdef cython_array_matmul(np.ndarray[DTYPE_t, ndim=2] A, np.ndarray[DTYPE_t, ndim=2] B):

    cdef int i, j, k, N = A.shape[0]
    cdef np.ndarray[DTYPE_t, ndim=2] C = np.zeros([N, N], dtype=DTYPE)

    for k in xrange(N):
        for i in xrange(N):
            for j in xrange(N):
                C[i][k] += A[i][j] * B[j][k]
    return C

Answer 1

Python lists and high performance math are incompatible, forget about cython_list_matmul . Python列表和高性能数学不兼容，请忘了cython_list_matmul 。

The only problem with your cython_array_matmul is incorrect usage of indexing. cython_array_matmul的唯一问题是索引使用不正确。 It should be 它应该是

C[i,k] += A[i,j] * B[j,k]

That's how numpy arrays are indexed in Python and that's the syntax Cython optimizes. 这就是在python中索引numpy数组的方式，这就是Cython优化的语法。 With this change you should get decent performance. 进行此更改后，您应该获得不错的性能。

Cython's annotation feature is really helpful in spotting optimization problems like this one. Cython的注释功能确实有助于发现此类优化问题。 You could notice that A[i][j] produces a ton of Python API calls, while A[i,j] produces none. 您可能会注意到A[i][j]会产生大量的Python API调用，而A[i,j]不会产生任何调用。

Also, if you initialize all entries by hand, np.empty is more appropriate than np.zeros . 另外，如果您手动初始化所有条目，则np.empty比np.zeros更合适。

Cython比纯Python快或慢

问题描述

1 个解决方案

解决方案1
11 已采纳 2013-06-10 17:08:48

Cython比纯Python快或慢

问题描述

1 个解决方案

解决方案1 11 已采纳 2013-06-10 17:08:48

解决方案1
11 已采纳 2013-06-10 17:08:48