简体   繁体   中英

generator vs list comprehension

In this following code is it better to use list comprehension or a generator?

from itertools import izip
n=2
l=izip(xrange(10**n), xrange(10**n))
print 3 not in [x[0] for x in l]
#or
#print 3 not in (x[0] for x in l)

In these tests if the list is large the generator is faster, if the lists are shorter the list comprehension apparently is faster.
Is this because the comprehension is computer just once?
For large lists: generator faster than listcomp
For small lists: generator slower than listcomp

in against a generator expression will make use of the __iter__() method and iterate the expression until a match is found, making it more efficient in the general case than the list comprehension, which produces the whole list first before scanning the result for a match.

The alternative for your specific example would be to use any() , to make the test more explicit. I find this to be a tad more readable:

any(x[0] == 3 for x in l)

You do have to take into account that in does forward the generator; you cannot use this method if you need to use the generator elsewhere as well.

As for your specific timing tests; your 'short' tests are fatally flawed. The first iteration the izip() generator will be entirely exhausted, making the other 9999 iterations test against an empty generator. You are testing the difference between creating an empty list and an empty generator there, amplifying the creation cost difference.

Moreover, you should use the timeit module to run tests, making sure that the test is repeatable . This means you have to create a new izip() object each iteration too; now the contrast is much larger :

>>> # Python 2, 'short'
...
>>> timeit.timeit("l = izip(xrange(10**2), xrange(10**2)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.27606701850891113
>>> timeit.timeit("l = izip(xrange(10**2), xrange(10**2)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
1.7422130107879639
>>> # Python 2, 'long'
...
>>> timeit.timeit("l = izip(xrange(10**3), xrange(10**3)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.3002200126647949
>>> timeit.timeit("l = izip(xrange(10**3), xrange(10**3)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
15.624258995056152

and on Python 3:

>>> # Python 3, 'short'
... 
>>> timeit.timeit("l = zip(range(10**2), range(10**2)); 3 not in (x[0] for x in l)", number=100000)
0.2624585109297186
>>> timeit.timeit("l = zip(range(10**2), range(10**2)); 3 not in [x[0] for x in l]", number=100000)
1.5555254180217162
>>> # Python 3, 'long'
... 
>>> timeit.timeit("l = zip(range(10**3), range(10**3)); 3 not in (x[0] for x in l)", number=100000)
0.27222433499991894
>>> timeit.timeit("l = zip(range(10**3), range(10**3)); 3 not in [x[0] for x in l]", number=100000)
15.76974998600781

In all cases, the generator variant is far faster; you have to shorten the 'short' version to just 8 tuples for the list comprehension to start to win:

>>> timeit.timeit("n = 8; l = izip(xrange(n), xrange(n)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.2870941162109375
>>> timeit.timeit("n = 8; l = izip(xrange(n), xrange(n)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
0.28503894805908203

On Python 3, where the implementations of generator expressions and list comprehensions were brought closer, you have to go down to 4 items before the list comprehension wins:

>>> timeit.timeit("n = 4; l = zip(range(n), range(8)); 3 not in (x[0] for x in l)", number=100000)
0.284480107948184
>>> timeit.timeit("n = 4; l = zip(range(n), range(8)); 3 not in [x[0] for x in l]", number=100000)
0.23570425796788186

Creating a generator is slower than creating a list, so you have to consider to variables: time for creating the object and time for testing the expression. So to answer your question, if by "better" you mean "faster": it depends on n .

There is a fair bit of overhead creating the generator expression, but eventually you make up for it by not needing to allocate a huge chunk of memory.

Small list comprehensions are faster as they don't have that overhead.

Usually the small cases are close enough, so in that case it's better to choose a generator expression

It is particularly important to conserve memory on a webserver where there might be 100's or 1000's of connections concurrently.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM