I am just starting to learn cython, so please excuse my ignorance. Can cython improve on numpy for simply adding two arrays together? My very bad attempt at adding two arrays a + b to give a new array c is:
import numpy as np
cimport numpy as np
DTYPE = np.int
ctypedef np.int_t DTYPE_t
def add_arrays(np.ndarray[DTYPE_t, ndim=2] a, np.ndarray[DTYPE_t, ndim=2] b, np.ndarray[DTYPE_t, ndim=2] c):
cdef int x = a.shape[0]
cdef int y = a.shape[1]
cdef int val_a
cdef int val_b
for j in range(x):
for k in range(y):
val_a = a[j][k]
val_b = b[j][k]
c[j][k] = val_a + val_b
return c
However this version is 700 times slower (*edit: than numpy) when these arrays are passed:
n = 1000
a = np.ones((n, n), dtype=np.int)
b = np.ones((n, n), dtype=np.int)
c = np.zeros((n, n), dtype=np.int)
I am obviously missing something very big.
The problem is that you are indexing the 2-D array like c[j][k]
when actually you should do c[j,k]
, otherwise Cython is using an intermediate buffer for buf=c[j]
, from which it will take buf[k]
, causing the slow-down. You should use this proper indexing plus the cdef
declarations especified by @XavierCombelle.
You can check that this intermediate buffer is causing the slow-down by doing:
np.ndarray[DTYPE_t, ndim=1] buf
and then, inside the loop:
buf = c[j]
buf[k] = val_a + val_b
this declared buffer should give the same speed (or close) than:
c[j,k] = val_a + val_b
I think you are missing
cdef int j
cdef int k
so your variable loop are python object not c ones
Here are two examples:
The "numpy way"
%%timeit
table1 = np.ones((10,10))
table2 = np.ones((10,10))
result = np.zeros((10,10))
table1 + table2
100000 loops, best of 3: 14.5 µs per loop
The looping over indices way
%%timeit
def add_arrays(ar1, ar2):
for j in range(len(ar1)):
for k in range(len(ar2)):
val_a = ar1[j][k]
val_b = ar2[j][k]
result[j][k] = val_a + val_b
return result
add_arrays(table1, table2)
1000 loops, best of 3: 307 µs per loop
Same thing, 20 times faster.
With all this, I am aware that I have not completely answered your question, but maybe it gives you a better perspective for your comparisons?
[edit] for 1000x1000 tables, the time difference is more pronounced; I supect is is due to the amortization of the overhead of building the tables.
former code: 100 loops, best of 3: 13.1 ms per loop
latter code: 1 loops, best of 3: 2.78 s per loop
Which is a 200 factor
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.