简体   繁体   中英

Python: count number of elements in list for if condition

Given a list of integers, what is the most Pythonic / best way of counting how many elements are within a certain range?

I researched and found 2 ways of doing it:

>>> x = [10, 60, 20, 66, 79, 5]
>>> len([i for i in x if 60 < i < 70])
1

or:

>>> x = [10, 60, 20, 66, 79, 5]
>>> sum(1 for i in x if 60 < i < 70)
1

Which method uses less time/memory (for larger lists) and why? Or maybe another way is better...

In the specific instances you presented

[i for i in x if 60 < i < 70]

actually generates a brand-new list, then takes its len . Conversely,

(1 for i in x if 60 < i < 70)

is a generator expression over which you take a sum .

For large enough relevant items, the second version will be more efficient (esp. in terms of memory).


Timings

x = [65] * 9999999

%%time

len([i for i in x if 60 < i < 70])

CPU times: user 724 ms, sys: 44 ms, total: 768 ms
Wall time: 768 ms
Out[7]:
9999999

%%time

sum(1 for i in x if 60 < i < 70)
CPU times: user 592 ms, sys: 0 ns, total: 592 ms
Wall time: 593 ms

The generator expression is more memory efficient, because you don't have to create an extra list.

Creating a list and getting it's length (the latter being a very fast O(1) operation) seems to be faster than creating a generator and doing n additions for relatively small lists.

In [13]: x = [1]
In [14]: timeit len([i for i in x if 60 < i < 70])
10000000 loops, best of 3: 141 ns per loop
In [15]: timeit sum(1 for i in x if 60 < i < 70)
1000000 loops, best of 3: 355 ns per loop
In [16]: x = range(10)
In [17]: timeit len([i for i in x if 60 < i < 70])
1000000 loops, best of 3: 564 ns per loop
In [18]: timeit sum(1 for i in x if 60 < i < 70)
1000000 loops, best of 3: 781 ns per loop
In [19]: x = range(50)
In [20]: timeit len([i for i in x if 60 < i < 70])
100000 loops, best of 3: 2.4 µs per loop
In [21]: timeit sum(1 for i in x if 60 < i < 70)
100000 loops, best of 3: 2.62 µs per loop
In [22]: x = range(1000)
In [23]: timeit len([i for i in x if 60 < i < 70])
10000 loops, best of 3: 50.9 µs per loop
In [24]: timeit sum(1 for i in x if 60 < i < 70)
10000 loops, best of 3: 51.7 µs per loop

I tried with various lists, for example [65]*n and the trend does not change. For example:

In [1]: x = [65]*1000
In [2]: timeit len([i for i in x if 60 < i < 70])
10000 loops, best of 3: 67.3 µs per loop
In [3]: timeit sum(1 for i in x if 60 < i < 70)
10000 loops, best of 3: 82.3 µs per loop

You can easily test this using the timeit module. For your particular example, the first len -based solution appears to be faster:

$ python --version
Python 2.7.10
$ python -m timeit -s "x = [10,60,20,66,79,5]" "len([i for i in x if 60 < i < 70])"
1000000 loops, best of 3: 0.514 usec per loop
$ python -m timeit -s "x = [10,60,20,66,79,5]" "sum(i for i in x if 60 < i < 70)"
1000000 loops, best of 3: 0.693 usec per loop

Even for larger lists -- but with most elements not matching your predicate -- the len version appears to be no slower:

$ python -m timeit -s "x = [66] + [8] * 10000" "len([i for i in x if 60 < i < 70])"
1000 loops, best of 3: 504 usec per loop
$ python -m timeit -s "x = [66] + [8] * 10000" "sum(1 for i in x if 60 < i < 70)"
1000 loops, best of 3: 501 usec per loop

In fact, even if most elements of the given list match (so a big result list is constructed to pass to len ), the len version wins:

$ python -m timeit -s "x = [66] + [65] * 10000" "len([i for i in x if 60 < i < 70])"
1000 loops, best of 3: 762 usec per loop
$ python -m timeit -s "x = [66] + [65] * 10000" "sum(1 for i in x if 60 < i < 70)"
1000 loops, best of 3: 935 usec per loop

However, what seems to be a lot faster is to not have a list in the first place, if possible, but rather hold eg a collections.Counter . Eg for 100000 elements, I get:

$ python -m timeit -s "import collections; x = [66] + [65] * 100000" "len([i for i in x if 60 < i < 70])"
100 loops, best of 3: 8.11 msec per loop
$ python -m timeit -s "import collections; x = [66] + [65] * 100000; d = collections.Counter(x)" "sum(v for k,v in d.items() if 60 < k < 70)"
1000000 loops, best of 3: 0.761 usec per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM