Why is using a Cython list faster than using a Python list?

Question

Here's my Python code:

X = [[0] * 1000] * 100
start = time()
for x in xrange(100):
    for i in xrange(len(X)):
        for j in xrange(len(X[i])):
            X[i][j] += 1
print  time() - start

My Cython code is the same:

X = [[0] * 1000] * 100
start = time()
for x in xrange(100):
    for i in xrange(len(X)):
        for j in xrange(len(X[i])):
            X[i][j] += 1
print  time() - start

Output:

Python cost: 2.86 sec
Cython cost: 0.41 sec

~~Any other faster way to do the same above in Python or Cython?~~

Update: Any way to create an 2d array X with height indexing performance close to array int X[][] that in C/C++?

For now I am considering use Python C API to do the job.

One more thing, a numpy array does the same thing but so much slower(70 sec) than list in both pure Python and Cython.

Python:

X = np.zeros((100,1000),dtype=np.int32)
start = time()
for x in xrange(100):
    for i in xrange(len(X)):
        for j in xrange(len(X[i])):
            X[i][j]+=1

If doing a lot of access to numeric array, which approach is the best?

Answer 1

To answer the question in your title, your Cython code beats your Python code because, despite the lack of cdef to declare variables, C code is being generated for the for loops (in addition to lots of extra C code to describe the Python objects). To speed up your Cython code use cdef to declare the integers i , j and x so they're no longer Python integers: eg cdef int i . You can also declare C-type arrays in Cython which should improve performance further.

A quick way to get the same result with NumPy:

X = np.zeros((100, 1000), dtype=np.int32)
X += 10000

If you can help it, you should never use for loops with NumPy arrays. They are completely different to lists in terms of memory usage.

Answer 2

Any other faster way to do the same above in Python or Cython?

The equivalent, faster code would be:

X = [[100 * 100] * 1000] * 100

In your code, you are creating a 1000 -long list of zeroes, then creating a 100 -long list of references to that list. Now, iterating 100 times over that 100 -long list results in each position being incremented 100 * 100 = 10000 times.

len(set(map(id, X)))
1

If you would like to end up with a list of 100 lists:

base = [100] * 1000
X = [list(base) for _ in xrange(100)]
len(set(map(id, X)))
100

Note that references to objects inside lists are still copied.

Answer 3

ajcr's answer probably is the fastest and simpliest one. You should first explicitly declare the datatype of your variables in the cython code. In addition I would make a prange instead of a simple range iterator for the outer loop. This will activate OpenMP multithreading what might speed up your code further but I really doubt that this solution will beat the numpy implementation.

Why is using a Cython list faster than using a Python list?

Question

3 answers

solution1
4 2014-08-19 09:51:23

solution2
1 2014-08-19 09:16:11

solution3
0 2014-08-19 12:46:41

Why is using a Cython list faster than using a Python list?

Question

3 answers

solution1 4 2014-08-19 09:51:23

solution2 1 2014-08-19 09:16:11

solution3 0 2014-08-19 12:46:41

solution1
4 2014-08-19 09:51:23

solution2
1 2014-08-19 09:16:11

solution3
0 2014-08-19 12:46:41