简体   繁体   中英

Cython not much faster than Python for Basic Sum Calculation

I'm trying to follow an example given on the Continuum Analytics blog benchmarking Python, Cython, Numba for a sum calculated using a for loop. Unfortunately, I'm seeing that Cython is slower than Python!

Here's my Python function definition:

def python_sum(y):
    N = len(y)
    x = y[0]
    for i in xrange(1,N):
        x += y[i]
    return x

And now my Cython function:

def cython_sum(int[:] y):
    cdef int N = y.shape[0]
    cdef int x = y[0]
    cdef int i
    for i in xrange(1,N):
        x += y[i]
    return x

Now I've got a script that pulls the two functions and benchmarks:

import timeit
import numpy as np
import cython_sum
import python_sum

b = np.ones(10000)

timer = timeit.Timer(stmt='python_sum.python_sum(b)', setup='from __main__ import python_sum, b')
print "Python Sum    (ms): %g" % (timer.timeit(1)*1000)

timer = timeit.Timer(stmt='cython_sum.cython_sum(b)', setup='from __main__ import cython_sum, b')
print "Cython     (ms): %g" % (timer.timeit(1)*1000)

And now my output is:

Python Sum    (ms): 9.44624
Cython     (ms): 8.54868

Based on the graphs in the blog post linked above, I was expecting a 100x - 1000x increase in speed, and yet all I'm seeing is that Cython is marginally faster than vanilla Python.

Am I doing something wrong here? This seems like a pretty basic question with a simple function definition, and clearly lots of people use Cython with great success, so clearly the error must lie with me. Can anyone shed some light on this and tell me what I'm doing wrong? Thank you!

I am not sure why you get that result. As a commenter said, your code, as-is, shouldn't even work, since you'd be passing float s into a function expecting int s. Maybe you left a cython_sum.py file lying around in the same directory?

I did the following. I created a python_sum.py that contained your exact definition of python_sum . Then I slightly changed your Cython code:

cython_sum.pyx :

def cython_sum(long[:] y):    #changed `int` to `long`
    cdef int N = y.shape[0]
    cdef int x = y[0]
    cdef int i
    for i in xrange(1,N):
        x += y[i]
    return x

I made a setup file to be able to build the Cython module:

setup.py :

from distutils.core import setup
from Cython.Build import cythonize

setup(
  name = 'Cython sum test',
  ext_modules = cythonize("cython_sum.pyx"),
)

I built the module using python setup.py build_ext --inplace . Next, I ran your test code with some modifications:

test.py :

import timeit
import numpy as np
import cython_sum
import python_sum

# ** added dtype=np.int to create integers **
b = np.ones(10000, dtype=np.int)    

# ** changed .timeit(1) to .timeit(1000) for each one **
timer = timeit.Timer(stmt='python_sum.python_sum(b)', setup='from __main__ import python_sum, b')
print "Python Sum    (ms): %g" % (timer.timeit(1000)*1000)

timer = timeit.Timer(stmt='cython_sum.cython_sum(b)', setup='from __main__ import cython_sum, b')
print "Cython        (ms): %g" % (timer.timeit(1000)*1000)

And I got the following result:

$ python test.py
Python Sum    (ms): 4111.74
Cython        (ms): 7.06697

Now that is a nice speed-up!


Additionally, by following the guidelines outlined here , I was able to get an additional (small) speed-up:

cython_fast_sum.pyx :

import numpy as np
cimport numpy as np

DTYPE = np.int
ctypedef np.int_t DTYPE_t

def cython_sum(np.ndarray[DTYPE_t, ndim=1] y):
    cdef int N = y.shape[0]
    cdef int x = y[0]
    cdef int i
    for i in xrange(1,N):
        x += y[i]
    return x

setup_fast.py :

from distutils.core import setup
from Cython.Build import cythonize
import numpy as np

setup(
  name = 'Cython fast sum test',
  ext_modules = cythonize("cython_fast_sum.pyx"),
  include_dirs = [np.get_include()],
)

test.py :

import timeit
import numpy as np
import cython_sum
import cython_fast_sum

b = np.ones(10000, dtype=np.int)

# ** note 100000 runs, not 1000 **
timer = timeit.Timer(stmt='cython_sum.cython_sum(b)', setup='from __main__ import cython_sum, b')
print "Cython naive  (ms): %g" % (timer.timeit(100000)*1000)

timer = timeit.Timer(stmt='cython_fast_sum.cython_sum(b)', setup='from __main__ import cython_fast_sum, b')
print "Cython fast   (ms): %g" % (timer.timeit(100000)*1000)

Result:

$ python test.py
Cython naive  (ms): 676.437
Cython fast   (ms): 645.797

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM