Poor(er) performance of Cython with NumPy array memoryview compared to C arrays

Question

I encountered a pretty weird result from a benchmark

Those are all different flavors of a bubblesort implementation, and the fastest approach at n=10^4 is converting a Python lists to C arrays internally. In contrast, the yellow line corresponds to code where I am using NumPy arrays with memoryview. I am expected the results to be vice versa. I (and colleagues) repeated the benchmark a couple of times and always got the same results. Maybe someone has an idea of what is going on here ...

在此输入图像描述

The black line in the plot would correspond to the code:

%%cython
cimport cython
from libc.stdlib cimport malloc, free

def cython_bubblesort_clist(a_list):
    """ 
    The Cython implementation of bubble sort with internal
    conversion between Python list objects and C arrays.

    """
    cdef int *c_list
    c_list = <int *>malloc(len(a_list)*cython.sizeof(int))
    cdef int count, i, j # static type declarations
    count = len(a_list)

    # convert Python list to C array
    for i in range(count):
        c_list[i] = a_list[i]

    for i in range(count):
        for j in range(1, count):
            if c_list[j] < c_list[j-1]:
                c_list[j-1], c_list[j] = c_list[j], c_list[j-1]

    # convert C array back to Python list
    for i in range(count):
        a_list[i] = c_list[i]

    free(c_list)
    return a_list

and the pink line to this code:

%%cython
import numpy as np
cimport numpy as np
cimport cython
def cython_bubblesort_numpy(long[:] np_ary):
    """ 
    The Cython implementation of bubble sort with NumPy memoryview.

    """
    cdef int count, i, j # static type declarations
    count = np_ary.shape[0]

    for i in range(count):
        for j in range(1, count):
            if np_ary[j] < np_ary[j-1]:
                np_ary[j-1], np_ary[j] = np_ary[j], np_ary[j-1]

    return np.asarray(np_ary)

Answer 1

As suggested in the comments above, I added the decorators

%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) 
@cython.wraparound(False)
cpdef cython_bubblesort_numpy(long[:] np_ary):
    """ 
    The Cython implementation of bubble sort with NumPy memoryview.

    """
    cdef int count, i, j # static type declarations
    count = np_ary.shape[0]

    for i in range(count):
        for j in range(1, count):
            if np_ary[j] < np_ary[j-1]:
                np_ary[j-1], np_ary[j] = np_ary[j], np_ary[j-1]

    return np.asarray(np_ary)

and the results are more what I expected now :)

在此输入图像描述

Answer 2

It is worth making one trivial change to your code to see if it improves things further:

cpdef cython_bubblesort_numpy(long[::1] np_ary):
    # ...

This tells cython that np_ary is a C contiguous array, and the generated code in the nested for loops can be further optimized with this information.

This code won't accept non-contiguous arrays as arguments, but that is fairly trivial to handle by using numpy.ascontiguousarray() .

Poor(er) performance of Cython with NumPy array memoryview compared to C arrays

Question

2 answers

solution1
3

solution2
1 2014-05-15 19:12:16

Poor(er) performance of Cython with NumPy array memoryview compared to C arrays

Question

2 answers

solution1 3

solution2 1 2014-05-15 19:12:16

solution1
3

solution2
1 2014-05-15 19:12:16