简体   繁体   English

生成器vs列表理解

[英]generator vs list comprehension

In this following code is it better to use list comprehension or a generator? 在下面的代码中,使用列表理解或生成器是否更好?

from itertools import izip
n=2
l=izip(xrange(10**n), xrange(10**n))
print 3 not in [x[0] for x in l]
#or
#print 3 not in (x[0] for x in l)

In these tests if the list is large the generator is faster, if the lists are shorter the list comprehension apparently is faster. 在这些测试中,如果列表较大,则生成器会更快,如果列表较短,则列表理解显然会更快。
Is this because the comprehension is computer just once? 这是因为理解只是一次计算机吗?
For large lists: generator faster than listcomp 对于大型列表: 生成器listcomp
For small lists: generator slower than listcomp 对于小列表: 生成器listcomp

in against a generator expression will make use of the __iter__() method and iterate the expression until a match is found, making it more efficient in the general case than the list comprehension, which produces the whole list first before scanning the result for a match. in对生成器表达式的使用将使用__iter__()方法并迭代该表达式直到找到匹配项,这使其在一般情况下比列表理解更有效,后者首先会生成整个列表,然后再扫描结果以查找匹配项。

The alternative for your specific example would be to use any() , to make the test more explicit. 您的特定示例的替代方法是使用any() ,以使测试更加明确。 I find this to be a tad more readable: 我觉得这更具可读性:

any(x[0] == 3 for x in l)

You do have to take into account that in does forward the generator; 你必须考虑到, in做前进的发电机; you cannot use this method if you need to use the generator elsewhere as well. 如果还需要在其他地方使用生成器,则不能使用此方法。

As for your specific timing tests; 至于您的特定计时测试; your 'short' tests are fatally flawed. 您的“短期”测试存在致命缺陷。 The first iteration the izip() generator will be entirely exhausted, making the other 9999 iterations test against an empty generator. izip()生成器的第一个迭代将完全耗尽,从而使其他9999个迭代针对生成器进行测试。 You are testing the difference between creating an empty list and an empty generator there, amplifying the creation cost difference. 您正在测试在其中创建一个空列表和一个空生成器之间的差异,从而扩大了创建成本的差异。

Moreover, you should use the timeit module to run tests, making sure that the test is repeatable . 此外,您应该使用timeit模块运行测试,确保测试可重复 This means you have to create a new izip() object each iteration too; 这意味着您还必须在每次迭代时都创建一个新的izip()对象。 now the contrast is much larger : 现在,对比度要大得多

>>> # Python 2, 'short'
...
>>> timeit.timeit("l = izip(xrange(10**2), xrange(10**2)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.27606701850891113
>>> timeit.timeit("l = izip(xrange(10**2), xrange(10**2)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
1.7422130107879639
>>> # Python 2, 'long'
...
>>> timeit.timeit("l = izip(xrange(10**3), xrange(10**3)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.3002200126647949
>>> timeit.timeit("l = izip(xrange(10**3), xrange(10**3)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
15.624258995056152

and on Python 3: 在Python 3上:

>>> # Python 3, 'short'
... 
>>> timeit.timeit("l = zip(range(10**2), range(10**2)); 3 not in (x[0] for x in l)", number=100000)
0.2624585109297186
>>> timeit.timeit("l = zip(range(10**2), range(10**2)); 3 not in [x[0] for x in l]", number=100000)
1.5555254180217162
>>> # Python 3, 'long'
... 
>>> timeit.timeit("l = zip(range(10**3), range(10**3)); 3 not in (x[0] for x in l)", number=100000)
0.27222433499991894
>>> timeit.timeit("l = zip(range(10**3), range(10**3)); 3 not in [x[0] for x in l]", number=100000)
15.76974998600781

In all cases, the generator variant is far faster; 在所有情况下,生成器变体都快得多; you have to shorten the 'short' version to just 8 tuples for the list comprehension to start to win: 您必须将“简短”版本缩短为仅8个元组,列表理解才能开始获胜:

>>> timeit.timeit("n = 8; l = izip(xrange(n), xrange(n)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.2870941162109375
>>> timeit.timeit("n = 8; l = izip(xrange(n), xrange(n)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
0.28503894805908203

On Python 3, where the implementations of generator expressions and list comprehensions were brought closer, you have to go down to 4 items before the list comprehension wins: 在Python 3上,生成器表达式和列表推导的实现更加接近了,在列表推导胜出之前,您必须降低4个项目:

>>> timeit.timeit("n = 4; l = zip(range(n), range(8)); 3 not in (x[0] for x in l)", number=100000)
0.284480107948184
>>> timeit.timeit("n = 4; l = zip(range(n), range(8)); 3 not in [x[0] for x in l]", number=100000)
0.23570425796788186

Creating a generator is slower than creating a list, so you have to consider to variables: time for creating the object and time for testing the expression. 创建生成器比创建列表要慢,因此您必须考虑变量:创建对象的时间和测试表达式的时间。 So to answer your question, if by "better" you mean "faster": it depends on n . 因此,要回答您的问题,如果“更好”表示“更快”:取决于n

There is a fair bit of overhead creating the generator expression, but eventually you make up for it by not needing to allocate a huge chunk of memory. 创建生成器表达式会产生很多开销,但是最终您不需要分配大量内存来弥补它。

Small list comprehensions are faster as they don't have that overhead. 小列表理解速度更快,因为它们没有这些开销。

Usually the small cases are close enough, so in that case it's better to choose a generator expression 通常情况下,小案例足够接近,因此在这种情况下,最好选择生成器表达式

It is particularly important to conserve memory on a webserver where there might be 100's or 1000's of connections concurrently. 在Web服务器上保留可能同时存在100或1000个连接的内存尤为重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM