[英]Cython either marginally faster or slower than pure Python
I am using several techniques ( NumPy , Weave and Cython ) to perform a Python performance benchmark. 我正在使用几种技术( NumPy , Weave和Cython )执行Python性能基准测试。 What the code basically does mathematically is
C = AB
, where A, B and C are N x N
matrices ( NOTE: this is a matrix product and not an element-wise multiplication). 代码基本上在数学上所做的是
C = AB
,其中A,B和C是N x N
矩阵( 注意:这是矩阵乘积,而不是逐元素乘法)。
I have written 5 distinct implementations of the code: 我已经编写了5种不同的代码实现:
My expectation is that implementations 2 through 5 will be substantially faster than implementation 1. My results however indicate otherwise. 我的期望是实施2到5将比实施1快得多。但是我的结果却相反。 These are my normalised speed-up results relative to the pure Python implementation:
这些是我相对于纯Python实现的标准化提速结果:
I am quite happy with the performance of NumPy, however I am less enthusiastic about Weave's performance and Cython's performance makes me cry. 我对NumPy的表现感到非常满意,但是我对Weave的表现并不热心,而Cython的表现使我哭泣。 My entire code is separated over two files.
我的整个代码分为两个文件。 Everything is automated and you simply need to run the first file to see all results.
一切都是自动化的,您只需要运行第一个文件即可查看所有结果。 Could someone please aid me by indicating what I could do to obtain better results?
有人可以帮我指出我可以做些什么以获得更好的结果吗?
matmul.py: matmul.py:
import time
import numpy as np
from scipy import weave
from scipy.weave import converters
import pyximport
pyximport.install()
import cython_matmul as cml
def python_list_matmul(A, B):
C = np.zeros(A.shape, dtype=float).tolist()
A = A.tolist()
B = B.tolist()
for k in xrange(len(A)):
for i in xrange(len(A)):
for j in xrange(len(A)):
C[i][k] += A[i][j] * B[j][k]
return C
def numpy_array_matmul(A, B):
return np.dot(A, B)
def weave_inline_matmul(A, B):
code = """
int i, j, k;
for (k = 0; k < N; ++k)
{
for (i = 0; i < N; ++i)
{
for (j = 0; j < N; ++j)
{
C(i, k) += A(i, j) * B(j, k);
}
}
}
"""
C = np.zeros(A.shape, dtype=float)
weave.inline(code, ['A', 'B', 'C', 'N'], type_converters=converters.blitz, compiler='gcc')
return C
N = 100
A = np.random.rand(N, N)
B = np.random.rand(N, N)
function = []
function.append([python_list_matmul, 'python_list'])
function.append([numpy_array_matmul, 'numpy_array'])
function.append([weave_inline_matmul, 'weave_inline'])
function.append([cml.cython_list_matmul, 'cython_list'])
function.append([cml.cython_array_matmul, 'cython_array'])
t = []
for i in xrange(len(function)):
t1 = time.time()
C = function[i][0](A, B)
t2 = time.time()
t.append(t2 - t1)
print function[i][1] + ' \t: ' + '{:10.6f}'.format(t[0] / t[-1])
cython_matmul.pyx: cython_matmul.pyx:
import numpy as np
cimport numpy as np
import cython
cimport cython
DTYPE = np.float
ctypedef np.float_t DTYPE_t
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cpdef cython_list_matmul(A, B):
cdef int i, j, k
cdef int N = len(A)
A = A.tolist()
B = B.tolist()
C = np.zeros([N, N]).tolist()
for k in xrange(N):
for i in xrange(N):
for j in xrange(N):
C[i][k] += A[i][j] * B[j][k]
return C
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cpdef cython_array_matmul(np.ndarray[DTYPE_t, ndim=2] A, np.ndarray[DTYPE_t, ndim=2] B):
cdef int i, j, k, N = A.shape[0]
cdef np.ndarray[DTYPE_t, ndim=2] C = np.zeros([N, N], dtype=DTYPE)
for k in xrange(N):
for i in xrange(N):
for j in xrange(N):
C[i][k] += A[i][j] * B[j][k]
return C
Python lists and high performance math are incompatible, forget about cython_list_matmul
. Python列表和高性能数学不兼容,请忘了
cython_list_matmul
。
The only problem with your cython_array_matmul
is incorrect usage of indexing. cython_array_matmul
的唯一问题是索引使用不正确。 It should be 它应该是
C[i,k] += A[i,j] * B[j,k]
That's how numpy arrays are indexed in Python and that's the syntax Cython optimizes. 这就是在python中索引numpy数组的方式,这就是Cython优化的语法。 With this change you should get decent performance.
进行此更改后,您应该获得不错的性能。
Cython's annotation feature is really helpful in spotting optimization problems like this one. Cython的注释功能确实有助于发现此类优化问题。 You could notice that
A[i][j]
produces a ton of Python API calls, while A[i,j]
produces none. 您可能会注意到
A[i][j]
会产生大量的Python API调用,而A[i,j]
不会产生任何调用。
Also, if you initialize all entries by hand, np.empty
is more appropriate than np.zeros
. 另外,如果您手动初始化所有条目,则
np.empty
比np.zeros
更合适。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.