Why is for loop on native python list faster than for loop on numpy array

Question

I was reading the chapter that introduces numpy in High Performance Python and played with the code on my own computer. Accidentally I ran the numpy version with for loop and found the result was surprisingly slow compared to native python loop.

A simplified version of the code is as follows, where I defined a 2D array X with 0 and another 2D array Y with 1, and then repeatedly add Y to X, conceptually X += Y.

import time
import numpy as np

grid_shape = (1024, 1024)

def simple_loop_comparison():
    xmax, ymax = grid_shape

    py_grid = [[0]*ymax for x in range(xmax)]
    py_ones = [[1]*ymax for x in range(xmax)]

    np_grid = np.zeros(grid_shape)
    np_ones = np.ones(grid_shape)

    def add_with_loop(grid, add_grid, xmax, ymax):
        for x in range(xmax):
            for y in range(ymax):
                grid[x][y] += add_grid[x][y]

    repeat = 20
    start = time.time()
    for i in range(repeat):
        # native python: loop over 2D array
        add_with_loop(py_grid, py_ones, xmax, ymax)
    print('for loop with native list=', time.time()-start)

    start = time.time()
    for i in range(repeat):
        # numpy: loop over 2D array
        add_with_loop(np_grid, np_ones, xmax, ymax)
    print('for loop with numpy array=', time.time()-start)

    start = time.time()
    for i in range(repeat):
        # vectorized numpy operation
        np_grid += np_ones
    print('numpy vectorization=', time.time()-start)

if __name__ == "__main__":
    simple_loop_comparison()

The result looks like:

# when repeat=10
for loop with native list= 2.545672655105591
for loop with numpy array= 11.622980833053589
numpy vectorization= 0.020279645919799805

# when repeat=20
for loop with native list= 5.195128440856934
for loop with numpy array= 23.241904258728027
numpy vectorization= 0.04613637924194336

I totally expect that numpy vectorized operation outperforms the other two but I was surprised to see that using for loop on numpy array results significantly slower than native python list. My understanding was that at least the cache should relatively filled up well with numpy array, even with for loop, it should outperform list without vectorization.

Is there something about numpy or how CPU/cache/memory works at low level that I didn't understand? Thank you very much.

EDIT: changed title

Answer 1

An even simpler case - list comprehension on a list versus an array:

In [119]: x = list(range(1000000))
In [120]: timeit [i for i in x]
47.4 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [121]: arr = np.array(x)
In [122]: timeit [i for i in arr]
131 ms ± 3.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

A list has a data buffer that contains pointers to objects else where in memory. So iteration or indexing a list just requires looking up that pointer, and fetching the object:

In [123]: type(x[1000])
Out[123]: int

An array stores its elements in a databuffer as bytes. Fetching an element requires finding those bytes (fast), and then wrapping them in a numpy object (according to dtype). Such an object is similar to an 0d single element array (with many of the same attributes).

In [124]: type(arr[1000])
Out[124]: numpy.int32

This indexing doesn't just fetch the number, it recreates it.

I often describe an object dtype array as an enhanced or degraded list. Like a list it contains pointers to objects elsewhere in memory, but it can't grow by append . We often say it looses many of the benefits of a numeric array. But its iteration speed falls between the other two:

In [125]: arrO = np.array(x, dtype=object)
In [127]: type(arrO[1000])
Out[127]: int
In [128]: timeit [i for i in arrO]
74.5 ms ± 1.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Anyways, I've found in other SO answers, that if you must iterate, stick with lists. And if you start with lists, it's often faster to also stick with lists. As you note the numpy vector speed is fast, but it takes time to create an array, which may cancel out any time savings.

Compare the time it takes to create an array from this list, with the time required to create such an array from scratch (with compiled numpy code):

In [129]: timeit np.array(x)
109 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [130]: timeit np.arange(len(x))
1.77 ms ± 31.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Answer 2

Since they're are conversions involved in asking numpy for data pointers, retrieving values at those pointer locations, and then using them to iterate. A python list has a few less of those steps. Numpy speed gains are only noticed if it can internally iterate or perform vector, matrix math and then return and answer or pointer to an array of answers.

Why is for loop on native python list faster than for loop on numpy array

Question

2 answers

solution1
4 ACCPTED 2017-12-15 06:45:31

solution2
0 2017-12-15 04:10:43

Why is for loop on native python list faster than for loop on numpy array

Question

2 answers

solution1 4 ACCPTED 2017-12-15 06:45:31

solution2 0 2017-12-15 04:10:43

solution1
4 ACCPTED 2017-12-15 06:45:31

solution2
0 2017-12-15 04:10:43