简体   繁体   English

生成器理解表达式之间的差异

[英]Differences between generator comprehension expressions

There are, as far as I know, three ways to create a generator through a comprehension 1 . 据我所知,有三种通过理解创建生成器的方法1

The classical one: 经典之一:

def f1():
    g = (i for i in range(10))

The yield variant: yield变量:

def f2():
    g = [(yield i) for i in range(10)]

The yield from variant (that raises a SyntaxError except inside of a function): 变量的yield from (在函数内部引发SyntaxError ):

def f3():
    g = [(yield from range(10))]

The three variants lead to different bytecode, which is not really surprising. 这三种变体导致不同的字节码,这并不奇怪。 It would seem logical that the first one is the best, since it's a dedicated, straightforward syntax to create a generator through comprehension. 第一个是最好的,这似乎是合乎逻辑的,因为它是通过理解创建生成器的专用,直接的语法。 However, it is not the one that produces the shortest bytecode. 但是,它不是产生最短字节码的那个。

Disassembled in Python 3.6 在Python 3.6中反汇编

Classical generator comprehension 经典的发电机理解

>>> dis.dis(f1)
4           0 LOAD_CONST               1 (<code object <genexpr> at...>)
            2 LOAD_CONST               2 ('f1.<locals>.<genexpr>')
            4 MAKE_FUNCTION            0
            6 LOAD_GLOBAL              0 (range)
            8 LOAD_CONST               3 (10)
           10 CALL_FUNCTION            1
           12 GET_ITER
           14 CALL_FUNCTION            1
           16 STORE_FAST               0 (g)

5          18 LOAD_FAST                0 (g)
           20 RETURN_VALUE

yield variant yield变量

>>> dis.dis(f2)
8           0 LOAD_CONST               1 (<code object <listcomp> at...>)
            2 LOAD_CONST               2 ('f2.<locals>.<listcomp>')
            4 MAKE_FUNCTION            0
            6 LOAD_GLOBAL              0 (range)
            8 LOAD_CONST               3 (10)
           10 CALL_FUNCTION            1
           12 GET_ITER
           14 CALL_FUNCTION            1
           16 STORE_FAST               0 (g)

9          18 LOAD_FAST                0 (g)
           20 RETURN_VALUE

yield from variant yield from变体的yield from

>>> dis.dis(f3)
12           0 LOAD_GLOBAL              0 (range)
             2 LOAD_CONST               1 (10)
             4 CALL_FUNCTION            1
             6 GET_YIELD_FROM_ITER
             8 LOAD_CONST               0 (None)
            10 YIELD_FROM
            12 BUILD_LIST               1
            14 STORE_FAST               0 (g)

13          16 LOAD_FAST                0 (g)
            18 RETURN_VALUE

In addition, a timeit comparison shows that the yield from variant is the fastest (still run with Python 3.6): 此外, timeit比较表明, yield from的变体是最快的(仍然与Python 3.6运行):

>>> timeit(f1)
0.5334039637357152

>>> timeit(f2)
0.5358906506760719

>>> timeit(f3)
0.19329123352712596

f3 is more or less 2.7 times as fast as f1 and f2 . f3或多或少是f1f2 2.7倍。

As Leon mentioned in a comment, the efficiency of a generator is best measured by the speed it can be iterated over. 正如莱昂在评论中提到的那样,发电机的效率最好用它可以迭代的速度来衡量。 So I changed the three functions so they iterate over the generators, and call a dummy function. 所以我更改了三个函数,以便迭代生成器,并调用虚函数。

def f():
    pass

def fn():
    g = ...
    for _ in g:
        f()

The results are even more blatant: 结果更加明显:

>>> timeit(f1)
1.6017412817975778

>>> timeit(f2)
1.778684261368946

>>> timeit(f3)
0.1960603619517669

f3 is now 8.4 times as fast as f1 , and 9.3 times as fast as f2 . f3现在是f1 8.4倍,是f2 9.3倍。

Note: The results are more or less the same when the iterable is not range(10) but a static iterable, such as [0, 1, 2, 3, 4, 5] . 注意:当iterable不是range(10)但是静态可迭代时,结果或多或少相同,例如[0, 1, 2, 3, 4, 5] Therefore, the difference of speed has nothing to do with range being somehow optimized. 因此,速度的差异与以某种方式优化的range无关。


So, what are the differences between the three ways? 那么,这三种方式有什么不同呢? More specifically, what is the difference between the yield from variant and the two other? 更具体地说,变体与另外两个yield from之间的差异是什么?

Is this normal behaviour that the natural construct (elt for elt in it) is slower than the tricky [(yield from it)] ? 这种正常的行为是自然构造(elt for elt in it)比棘手的[(yield from it)]慢吗? Shall I from now on replace the former by the latter in all of my scripts, or is there any drawbacks to using the yield from construct? 从现在起我应该在所有脚本中用后者替换前者,还是使用构造中的yield from有任何缺点?


Edit 编辑

This is all related, so I don't feel like opening a new question, but this is getting even stranger. 这一切都是相关的,所以我不想开一个新问题,但这变得更加陌生。 I tried comparing range(10) and [(yield from range(10))] . 我尝试比较range(10)[(yield from range(10))]

def f1():
    for i in range(10):
        print(i)

def f2():
    for i in [(yield from range(10))]:
        print(i)

>>> timeit(f1, number=100000)
26.715589237537195

>>> timeit(f2, number=100000)
0.019948781941049987

So. 所以。 Now, iterating over [(yield from range(10))] is 186 times as fast as iterating over a bare range(10) ? 现在,迭代[(yield from range(10))]得到的速度是在裸range(10)迭代的186倍?

How do you explain why iterating over [(yield from range(10))] is so much faster than iterating over range(10) ? 你如何解释为什么迭代[(yield from range(10))]得到的速度比在range(10)迭代要快得多?


1: For the sceptical, the three expressions that follow do produce a generator object; 1:对于持怀疑态度,后面的三个表达式会生成一个generator对象; try and call type on them. 尝试并调用它们的type

 g = [(yield i) for i in range(10)] 

This construct accumulates the data that is/may be passed back into the generator through its send() method and returns it via the StopIteration exception when the iteration is exhausted 1 : 此构造累积可以通过其send()方法传递回生成器的数据,并在迭代耗尽时通过StopIteration异常返回1

>>> g = [(yield i) for i in range(3)]
>>> next(g)
0
>>> g.send('abc')
1
>>> g.send(123)
2
>>> g.send(4.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: ['abc', 123, 4.5]
>>> #          ^^^^^^^^^^^^^^^^^

No such thing happens with plain generator comprehension: 普通的生成器理解不会发生这样的事情:

>>> g = (i for i in range(3))
>>> next(g)
0
>>> g.send('abc')
1
>>> g.send(123)
2
>>> g.send(4.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> 

As for the yield from version - in Python 3.5 (which I am using) it doesn't work outside functions, so the illustration is a little different: 至于版本的yield from - 在Python 3.5(我正在使用)中它不能在函数外部工作,所以插图有点不同:

>>> def f(): return [(yield from range(3))]
... 
>>> g = f()
>>> next(g)
0
>>> g.send(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in f
AttributeError: 'range_iterator' object has no attribute 'send'

OK, send() doesn't work for a generator yield ing from range() but let's at least see what's at the end of the iteration: OK, send()用于发电机不工作yield荷兰国际集团from range()但我们至少可以看到在迭代结束什么:

>>> g = f()
>>> next(g)
0
>>> next(g)
1
>>> next(g)
2
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: [None]
>>> #          ^^^^^^

1 Note that even if you don't use the send() method, send(None) is assumed, therefore a generator constructed in this way always uses more memory than plain generator comprehension (since it has to accumulate the results of the yield expression till the end of the iteration): 1请注意,即使您不使用send()方法,也假定send(None) ,因此以这种方式构造的生成器总是使用比普通生成器理解更多的内存(因为它必须累积yield表达式的结果直到迭代结束):

>>> g = [(yield i) for i in range(3)]
>>> next(g)
0
>>> next(g)
1
>>> next(g)
2
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: [None, None, None]

UPDATE UPDATE

Regarding the performance differences between the three variants. 关于三种变体之间的性能差异。 yield from beats the other two because it eliminates a level of indirection (which, to the best of my understanding, is one of the two main reasons why yield from was introduced). yield from的其他两拍,因为它消除间接(其中,尽我的理解,是两个主要的原因之一的水平yield from引入)。 However, in this particular example yield from itself is superfluous - g = [(yield from range(10))] is actually almost identical to g = range(10) . 然而,在这个特定的例子中yield from自身的yield from是多余的 - g = [(yield from range(10))]实际上几乎与g = range(10)

This is what you should be doing: 这是你应该做的:

g = (i for i in range(10))

It's a generator expression. 这是一个生成器表达式。 It's equivalent to 它相当于

def temp(outer):
    for i in outer:
        yield i
g = temp(range(10))

but if you just wanted an iterable with the elements of range(10) , you could have done 但是如果你只想要一个带有range(10)元素的迭代,你就可以做到

g = range(10)

You do not need to wrap any of this in a function. 您不需要在函数中包含任何此类内容。

If you're here to learn what code to write, you can stop reading. 如果你在这里学习要写的代码,你可以停止阅读。 The rest of this post is a long and technical explanation of why the other code snippets are broken and should not be used, including an explanation of why your timings are broken too. 这篇文章的其余部分是一个长期的技术性解释,说明为什么其他代码片段被破坏而且不应该被使用,包括解释为什么你的时间也被破坏了。


This: 这个:

g = [(yield i) for i in range(10)]

is a broken construct that should have been taken out years ago. 是一个应该在几年前被取出的破碎的结构。 8 years after the problem was originally reported , the process to remove it is finally beginning . 最初报告该问题8年后, 终止该问题的过程终于开始了 Don't do it. 不要这样做。

While it's still in the language, on Python 3, it's equivalent to 虽然它仍然在语言中,但在Python 3上,它相当于

def temp(outer):
    l = []
    for i in outer:
        l.append((yield i))
    return l
g = temp(range(10))

List comprehensions are supposed to return lists, but because of the yield , this one doesn't. 列表推导应该返回列表,但由于yield ,这个没有。 It acts kind of like a generator expression, and it yields the same things as your first snippet, but it builds an unnecessary list and attaches it to the StopIteration raised at the end. 它有点像生成器表达式,它产生与第一个片段相同的东西,但它构建了一个不必要的列表并将其附加到最后引发的StopIteration

>>> g = [(yield i) for i in range(10)]
>>> [next(g) for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: [None, None, None, None, None, None, None, None, None, None]

This is confusing and a waste of memory. 这令人困惑,浪费内存。 Don't do it. 不要这样做。 (If you want to know where all those None s are coming from, read PEP 342 .) (如果你想知道所有这些None s是,读来PEP 342 )。

On Python 2, g = [(yield i) for i in range(10)] does something entirely different. 在Python 2上, g = [(yield i) for i in range(10)]做了一些完全不同的事情。 Python 2 doesn't give list comprehensions their own scope - specifically list comprehensions, not dict or set comprehensions - so the yield is executed by whatever function contains this line. Python 2没有给出列表推导它们自己的范围 - 特别是列表推导,而不是dict或set comprehensions - 所以yield由任何包含这一行的函数执行。 On Python 2, this: 在Python 2上,这个:

def f():
    g = [(yield i) for i in range(10)]

is equivalent to 相当于

def f():
    temp = []
    for i in range(10):
        temp.append((yield i))
    g = temp

making f a generator-based coroutine, in the pre-async sense . 使f基于发电机协程,在异步预感 Again, if your goal was to get a generator, you've wasted a bunch of time building a pointless list. 再说一次,如果你的目标是获得一台发电机,你就浪费了很多时间来建立一个无意义的列表。


This: 这个:

g = [(yield from range(10))]

is silly, but none of the blame is on Python this time. 是愚蠢的,但这次没有任何责任归咎于Python。

There is no comprehension or genexp here at all. 这里根本没有理解或基因。 The brackets are not a list comprehension; 括号不是列表理解; all the work is done by yield from , and then you build a 1-element list containing the (useless) return value of yield from . 所有的工作都是通过yield from完成的,然后你构建一个包含yield from的(无用的)返回值的1元素列表。 Your f3 : 你的f3

def f3():
    g = [(yield from range(10))]

when stripped of the unnecessary list-building, simplifies to 当剥离不必要的列表构建时,简化为

def f3():
    yield from range(10)

or, ignoring all the coroutine support stuff yield from does, 或者,忽略所有协同支持的东西yield from

def f3():
    for i in range(10):
        yield i

Your timings are also broken. 你的时间也被打破了。

In your first timing, f1 and f2 create generator objects that can be used inside those functions, though f2 's generator is weird. 在你的第一个时间, f1f2创建可以在这些函数内使用的生成器对象,尽管f2的生成器很奇怪。 f3 doesn't do that; f3不这样做; f3 is a generator function. f3 一个生成器函数。 f3 's body does not run in your timings, and if it did, its g would behave quite unlike the other functions' g s. f3的身体并不在您的时间运行,如果有,其g会表现得完全不同于其他功能“ g秒。 A timing that would actually be comparable with f1 and f2 would be 实际上与f1f2相当的时间将是

def f4():
    g = f3()

In your second timing, f2 doesn't actually run, for the same reason f3 was broken in the previous timing. 在你的第二个时间, f2实际上没有运行,因为同样的原因f3在前一个时间被打破了。 In your second timing, f2 is not iterating over a generator. 在你的第二个时间, f2没有迭代生成器。 Instead, the yield from turns f2 into a generator function itself. 相反, yield from f2变为生成器函数本身的yield from

This might not do what you think it does. 这可能不符合您的想法。

def f2():
    for i in [(yield from range(10))]:
        print(i)

Call it: 叫它:

>>> def f2():
...     for i in [(yield from range(10))]:
...         print(i)
...
>>> f2() #Doesn't print.
<generator object f2 at 0x02C0DF00>
>>> set(f2()) #Prints `None`, because `(yield from range(10))` evaluates to `None`.
None
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Because the yield from is not within a comprehension, it is bound to the f2 function instead of an implicit function, turning f2 into a generator function. 因为yield from不在理解范围内,所以它与f2函数绑定而不是隐式函数,将f2转换为生成函数。


I remembered seeing someone point out that it was not actually iterating, but I can't remember where I saw that. 我记得看到有人指出它实际上并没有迭代,但我不记得我在哪里看到它。 I was testing the code myself when I rediscovered this. 当我重新发现这个时,我正在测试代码。 I didn't find the source searching through the mailing list post nor the bug tracker thread . 我没有找到源搜索邮件列表帖子bug跟踪器线程 If someone finds the source, please tell me or add it to the post itself, so it can be credited. 如果有人找到了来源,请告诉我或将其添加到帖子本身,这样可以记入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM