简体   繁体   中英

Fastest way to append nonzero numpy array elements to list

I want to add all nonzero elements from a numpy array arr to a list out_list . Previous research suggests that for numpy arrays, using np.nonzero is most efficient. (My own benchmark below actually suggests it can be slightly improved using np.delete ).

However, in my case I want my output to be a list, because I am combining many arrays for which I don't know the number of nonzero elements (so I can't effectively preallocate a numpy array for them). Hence, I was wondering whether there are some synergies that can be exploited to speed up the process. While my naive list comprehension approach is much slower than the pure numpy approach, I got some promising results combining list comprehension with numba .

Here's what I found so far:

import numpy as np

n = 60_000  # size of array
nz = 0.3  # fraction of zero elements

arr = (np.random.random_sample(n) - nz).clip(min=0)

# method 1
def add_to_list1(arr, out):
    out.extend(list(arr[np.nonzero(arr)]))

# method 2
def add_to_list2(arr, out):
    out.extend(list(np.delete(arr, arr == 0)))

# method 3
def add_to_list3(arr, out):
    out += [x for x in arr if x != 0]

# method 4 (not sure how to get numba to accept an empty list as argument)
@njit
def add_to_list4(arr):
    return [x for x in arr if x != 0]

out_list = []
%timeit add_to_list1(arr, out_list)

out_list = []
%timeit add_to_list2(arr, out_list)

out_list = []
%timeit add_to_list3(arr, out_list)

_ = add_to_list4(arr)  # call once to compile
out_list = []
%timeit out_list.extend(add_to_list4(arr))

Yielding the following results:

2.51 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.19 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15.6 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.63 ms ± 158 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Not surprisingly, numba outperforms all other methods. Among the rest, method 2 (using np.delete ) is the best. Am I missing any obvious alternative that exploits the fact that I am converting to a list afterwards? Can you think of anything to further speed up the process?

Edit 1:

Performance of .tolist() :

# method 5
def add_to_list5(arr, out):
    out += arr[arr != 0].tolist()

# method 6
def add_to_list6(arr, out):
    out += np.delete(arr, arr == 0).tolist()

# method 7
def add_to_list7(arr, out):
    out += arr[arr.astype(bool)].tolist()

Timings are on par with numba :

1.62 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.65 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each
1.78 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Edit 2:

Here's some benchmarking using Mad Physicist's suggestion to use np.concatenate to construct a numpy array instead.

# construct numpy array using np.concatenate
out_list = []
t = time.perf_counter()
for i in range(100):
    out_list.append(arr[arr != 0])
result = np.concatenate(out_list)
print(f"Time elapsed: {time.perf_counter() - t:.4f}s")

# compare with best list-based method
out_list = []
t = time.perf_counter()
for i in range(100):
    out_list += arr[arr != 0].tolist()
print(f"Time elapsed: {time.perf_counter() - t:.4f}s")

Concatenating numpy arrays yields indeed another significant speed-up, although it is not directly comparable since the output is a numpy array instead of a list. So it will depend on the precise use what will be best.

Time elapsed: 0.0400s
Time elapsed: 0.1430s

TLDR;

1/ using arr[arr != 0] is the fastest of all the indexing options

2/ using .tolist() instead of list(.) speeds up things by a factor 1.3 - 1.5

3/ with the gains of 1/ and 2/ combined, the speed is on par with numba

4/ if having a numpy array instead of a list is acceptable, then using np.concatenate yields another gain in speed by a factor of ~3.5 compared to the best alternative

I submit that the method of choice, if you are indeed looking for a list output, is:

def f(arr, out_list):
    out_list += arr[arr != 0].tolist()

It seems to beat all the other methods mentioned so far in the OP's question or in other responses (at the time of this writing).

If, however, you are looking for a result as a numpy array, then following @MadPhysicist's version (slightly modified to use arr[arr != 0] instead of using np.nonzero() ) is almost 6x faster, see at the end of this post.

Side note: I would avoid using %timeit out_list.extend(some_list) : it keeps adding to out_list during the many loops of timeit . Example:

out_list = []
%timeit out_list.extend([1,2,3])

and now:

>>> len(out_list)
243333333  # yikes

Timings

On 60K items on my machine, I see:

out_list = []

a = %timeit -o out_list + arr[arr != 0].tolist()
b = %timeit -o out_list + arr[np.nonzero(arr)].tolist()
c = %timeit -o out_list + list(arr[np.nonzero(arr)])

Yields:

1.23 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.53 ms ± 2.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4.29 ms ± 3.02 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And:

>>> c.average / a.average
3.476

>>> b.average / a.average
1.244

For a numpy array result instead

Following @MadPhysicist, you can get some extra boost by not turning the arrays into lists, but using np.concatenate() instead:

def all_nonzero(arr_iter):
    """return non zero elements of all arrays as a np.array"""
    return np.concatenate([a[a != 0] for a in arr_iter])

def all_nonzero_list(arr_iter):
    """return non zero elements of all arrays as a list"""
    out_list = []
    for a in arr_iter:
        out_list += a[a != 0].tolist()
    return out_list
from itertools import repeat

ta = %timeit -o all_nonzero(repeat(arr, 100))
tl = %timeit -o all_nonzero_list(repeat(arr, 100))

Yields:

39.7 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
227 ms ± 680 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

and

>>> tl.average / ta.average
5.75

Instead of extending a list by all of the elements of a new array, append the array itself. This will make for much fewer and smaller reallocations. You can also pre-allocate a list of None s up-front or even use an object array, if you have an upper bound on the number of arrays you will process.

When you're done, call np.concatenate on the list.

So instead of this:

L = []
for i in range(10):
    arr = (np.random.random_sample(n) - nz).clip(min=0)
    L.extend(arr[np.nonzero(arr)])
result = np.array(L)

Try this:

L = []
for i in range(10):
    arr = (np.random.random_sample(n) - nz).clip(min=0)
    L.append(arr[np.nonzero(arr)])
result = np.concatenate(L)

Since you're keeping arrays around, the final concatenation will be a series of buffer copies (which is fast), rather than a bunch of python-to numpy type conversions (which won't be). The exact method you choose for deletion is of course still up to the result of your benchmark.

Also, here's another method to add to your benchmark:

def add_to_list5(arr, out):
    out.extend(list(arr[arr.astype(bool)]))

I don't expect this to be overwhelmingly fast, but it's interesting to see how masking stacks up next to indexing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM