简体   繁体   English

为什么将列表转换为集合要比将生成器转换为集合快?

[英]Why converting list to set is faster than converting generator to set?

Here is an example 这是一个例子

>>> from timeit import timeit
>>> print(timeit('[y for y in range(100)]', number=100000))
0.7025867114395824
>>> print(timeit('(y for y in range(100))', number=100000))
0.09295392291478244
>>> print(timeit('set([y for y in range(100)])', number=100000))
1.0864544935180334
>>> print(timeit('set((y for y in range(100)))', number=100000))
1.1277489876506621

It is very confusing. 这很令人困惑。 Generator takes less time to create(and that is understandable) but why converting generator to set is slower than converting list when it should(atleast to my knowledge) have been the opposite. 生成器花费的时间更少(这是可以理解的),但是为什么将生成器转换为set却比转换列表要慢(据我所知)却相反。

First of all, there is no point in timing the creation of a generator expression. 首先,计时生成器表达式的时间没有意义。 Creating a generator doesn't iterate over the contents, so it's very fast. 创建生成器不会迭代内容,因此非常快。 Spot the differences between creating a generator expression over one element vs. over 10 million: 找出在一个元素和超过一千万个元素之间生成生成器表达式的区别:

>>> print(timeit('(y for y in range(1))', number=100000))
0.060932624037377536
>>> print(timeit('(y for y in range(10000000))', number=100000))
0.06168231705669314

Generators take more time to iterate over than, say a list object: 生成器迭代所需的时间比列表对象要多:

>>> from collections import deque
>>> def drain_iterable(it, _deque=deque):
...     deque(it, maxlen=0)
...
>>> def produce_generator():
...     return (y for y in range(100))
...
>>> print(timeit('drain_iterable(next(generators))',
...              'from __main__ import drain_iterable, produce_generator;'
...              'generators=iter([produce_generator() for _ in range(100000)])',
...              number=100000))
0.5204695729771629
>>> print(timeit('[y for y in range(100)]', number=100000))
0.3088444779859856

Here I tested iteration over the generator expression by just discarding all elements as fast as possible . 在这里,我通过尽可能快地丢弃所有元素来测试生成器表达式的迭代。

That's because a generator is essentially a function being executed until it yields a value, then is paused, then is activated again for the next value, then paused again. 这是因为生成器本质上是一个正在执行的函数,直到生成一个值,然后暂停,然后为下一个值再次激活,然后再次暂停。 See What does the "yield" keyword do? 请参阅“ yield”关键字的作用是什么? for a good overview. 以获得良好的概述。 The administration involved with this process takes time. 与此过程有关的管理需要时间。 In contrast, a list comprehension doesn't have to spend this time, it does all looping without re-activating and de-activating a function for every value produced. 相比之下,列表理解不必花时间,它可以完成所有循环,而无需为每个产生的值重新激活和停用函数。

Generators are memory efficient , not execution efficient. 生成器是内存有效的 ,而不是执行有效的。 They can save execution time, sometimes , but usually because you are avoiding allocating and deallocating larger blocks of memory. 它们有时可以节省执行时间,但通常是因为您避免分配和取消分配更大的内存块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM