简体   繁体   中英

Fastest way to count sequence elements satisfying a predicate

I want to count the elements verifying a certain property in a throw-away sequence. I am somewhat surprised that a generator expression would not be the fastest one:

from random import random
l = [random() for i in range(1000000)]
%timeit len([None for x in l if x < 0.5])
%timeit len([x for x in l if x < 0.5])
%timeit sum(1 for x in l if x < 0.5)
%timeit sum(x < 0.5 for x in l)

Measured performances:

90.7 ms ± 7.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
97.7 ms ± 7.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
105 ms ± 3.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
178 ms ± 2.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Is there a faster way to do this?

You can use NumPy, if the conversion to an NumPy array itself does not count:

import numpy as np
a = np.array(l)
%timeit np.sum(a < 0.5)

1.28 ms ± 48.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Even taking the conversion into account makes it significantly faster:

%%timeit 
a = np.array(l)
np.sum(a < 0.5)

27.2 ms ± 433 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Compared to the pure Python versions on my machine.

from random import random
l = [random() for i in range(1000000)]
%timeit len([None for x in l if x < 0.5])
%timeit len([x for x in l if x < 0.5])
%timeit sum(1 for x in l if x < 0.5)
%timeit sum(x < 0.5 for x in l)

46.4 ms ± 941 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
48.1 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
59.5 ms ± 811 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
103 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM