简体   繁体   中英

What is the reason for slowness of list(numpy.array)?

It is well known, that if a is a numpy array, then a.tolist() is faster than list(a) , for example:

>>> import numpy as np
>>> big_np=np.random.randint(1,10**7,(10**7,))

>>> %timeit list(big_np)
1 loop, best of 3: 869 ms per loop
>>> %timeit big_np.tolist()
1 loop, best of 3: 306 ms per loop

That means, the naive list(a) version is about factor 3 slower than the special-function tolist() .

However, comparing it to the the performance of the build-in array -module:

>>> import array
>>> big_arr=array.array('i', big_np)
>>> %timeit list(big_arr)
1 loop, best of 3: 312 ms per loop

we can see, that one should probably say, that list(a) is slow rather than tolist() is fast, because array.array is as fast as the special function.

Another observation: array.array -module and tolist benefit from the small-integer-pool (ie when values are in range [-5, 256] ), but this is not the case for list(a) :

##only small integers:
>>> small_np=np.random.randint(1,250, (10**7,))
>>> small_arr=array.array('i', small_np)

>>> %timeit list(small_np)
1 loop, best of 3: 873 ms per loop
>>> %timeit small_np.tolist()
10 loops, best of 3: 188 ms per loop
>>> %timeit list(small_arr)
10 loops, best of 3: 150 ms per loop

As we can see the faster versions are about 2 times faster, but the slow version is as slow as before.

My question: what slows list(numpy.array) down compared to list(array.array) ?

Edit:

One more observation, for Python2.7, it takes longer if the integers are bigger (ie cannot be hold by int32 ):

>>> very_big=np.random.randint(1,10**7,(10**7,))+10**17
>>> not_so_big=np.random.randint(1,10**7,(10**7,))+10**9
>>> %timeit very_big.tolist()
1 loop, best of 3: 627 ms per loop
>>> %timeit not_so_big.tolist()
1 loop, best of 3: 302 ms per loop

but still faster, than the slow list-version.

Here is a partial answer explaining your observation re the small integer pool:

>>> a = np.arange(10)
>>> type(list(a)[0])
<class 'numpy.int64'>
>>> type(a.tolist()[0])
<class 'int'>

As we can see tolist seems to try and create elements of native python type whereas the array iterator (which is what is used by the list constructor) doesn't bother.

Indeed, the C implementation of tolist (source here ) uses PyArray_GETITEM which is the equivalent to Python arr[index].item() , not - as one might assume - arr[index]

Basically, the answer of Paul Panzer explains what happens: in the slow list(...) version the resulting elements of the list are not python-integers, but are numpy-scalars, eg numpy.int64 . This answer just elaborates a little bit and connects the dots.

I didn't make a systematic profiling, but when stopped in the debugger, every time both versions were in the routines which created the integer-object, so it is very probably this is where the lion's share of the execution time is spent, and the overhead doesn't play a big role.

The list(..) -version, iterator calls array_item , which has an special treatment for one-dimensional arrays and calls PyArray_Scalar , which is a quite generic function and doesn't use the machinery of the Pythons-integer-creation. It happens to be slower than the Python-version, there is also no integer-pool for small values.

The .tolist() version calls recursive_tolist , which eventually uses Python's PyLong_FromLong(long) , which shows all the observed behaviors and happens to be faster than numpy-functionality (probably, because it is not the normal way of using numpy, not many optimizations were done).

There is a small difference for Python2 compared to Python3: Python2 has two different classes of integers: the one, more efficient, for numbers up to 32bit and the one for arbitrary big numbers - thus for the bigger numbers the most general (and thus more costly) path must be taken - this also can be observed.

Constructing a list with list(something) iterates over something and collects the result of the iteration into a new list.

If list(small_np) is slower than list(small_arr) one could assume that iterating over small_np is slower than iterating over small_arr . Let's verify that:

%timeit for i in small_np: pass   # 1 loop, best of 3: 916 ms per loop
%timeit for i in small_arr: pass  # 1 loop, best of 3: 261 ms per loop

Yep, iterating over a numpy array seems to be slower. This is where I must start speculating. Numpy arrays are flexible. They can have an arbitrary number of dimensions with various strides. Array arrays are always flat. This flexibility probably likely comes at a cost, which manifests in more complex iteration.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM