简体   繁体   English

Python`yield from`,还是返回一个生成器?

[英]Python `yield from`, or return a generator?

I wrote this simple piece of code:我写了这段简单的代码:

def mymap(func, *seq):
  return (func(*args) for args in zip(*seq))

Should I use the 'return' statement as above to return a generator, or use a 'yield from' instruction like this:我应该使用上面的“return”语句返回一个生成器,还是使用这样的“yield from”指令:

def mymap(func, *seq):
  yield from (func(*args) for args in zip(*seq))

and beyond the technical difference between 'return' and 'yield from', which is the better approach the in general case?除了“回报”和“收益”之间的技术差异之外,一般情况下哪种方法更好?

The difference is that your first mymap is just a usual function, in this case a factory which returns a generator.不同之处在于您的第一个mymap只是一个普通函数,在这种情况下是一个返回生成器的工厂。 Everything inside the body gets executed as soon as you call the function.调用该函数后,主体内的所有内容都会立即执行。

def gen_factory(func, seq):
    """Generator factory returning a generator."""
    # do stuff ... immediately when factory gets called
    print("build generator & return")
    return (func(*args) for args in seq)

The second mymap is also a factory, but it's also a generator itself, yielding from a self-built sub-generator inside.第二个mymap也是一个工厂,但它本身也是一个生成器,从内部自建的子生成器产生。 Because it is a generator itself, execution of the body does not start until the first invokation of next(generator).因为它本身就是一个生成器,所以直到第一次调用 next(generator) 时才会开始执行主体。

def gen_generator(func, seq):
    """Generator yielding from sub-generator inside."""
    # do stuff ... first time when 'next' gets called
    print("build generator & yield")
    yield from (func(*args) for args in seq)

I think the following example will make it clearer.我认为下面的例子会更清楚。 We define data packages which shall be processed with functions, bundled up in jobs we pass to the generators.我们定义了数据包,这些数据包应该用函数处理,捆绑在我们传递给生成器的作业中。

def add(a, b):
    return a + b

def sqrt(a):
    return a ** 0.5

data1 = [*zip(range(1, 5))]  # [(1,), (2,), (3,), (4,)]
data2 = [(2, 1), (3, 1), (4, 1), (5, 1)]

job1 = (sqrt, data1)
job2 = (add, data2)

Now we run the following code inside an interactive shell like IPython to see the different behavior.现在我们在像 IPython 这样的交互式 shell 中运行以下代码来查看不同的行为。 gen_factory immediately prints out, while gen_generator only does so after next() being called. gen_factory立即打印出来,而gen_generator仅在next()被调用后才打印出来。

gen_fac = gen_factory(*job1)
# build generator & return <-- printed immediately
next(gen_fac)  # start
# Out: 1.0
[*gen_fac]  # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]

gen_gen = gen_generator(*job1)
next(gen_gen)  # start
# build generator & yield <-- printed with first next()
# Out: 1.0
[*gen_gen]  # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]

To give you a more reasonable use case example for a construct like gen_generator we'll extend it a little and make a coroutine out of it by assigning yield to variables, so we can inject jobs into the running generator with send() .为了给像gen_generator这样的构造提供一个更合理的用例示例,我们将对其进行一些扩展,并通过将 yield 分配给变量来从中创建一个协程,这样我们就可以使用send()将作业注入正在运行的生成器中。

Additionally we create a helper function which will run all tasks inside a job and ask as for a new one upon completion.此外,我们创建了一个辅助函数,它将运行一个作业中的所有任务,并在完成后请求一个新的任务。

def gen_coroutine():
    """Generator coroutine yielding from sub-generator inside."""
    # do stuff... first time when 'next' gets called
    print("receive job, build generator & yield, loop")
    while True:
        try:
            func, seq = yield "send me work ... or I quit with next next()"
        except TypeError:
            return "no job left"
        else:
            yield from (func(*args) for args in seq)


def do_job(gen, job):
    """Run all tasks in job."""
    print(gen.send(job))
    while True:
        result = next(gen)
        print(result)
        if result == "send me work ... or I quit with next next()":
            break

Now we run gen_coroutine with our helper function do_job and two jobs.现在我们使用辅助函数do_job和两个作业运行gen_coroutine

gen_co = gen_coroutine()
next(gen_co)  # start
# receive job, build generator & yield, loop  <-- printed with first next()
# Out:'send me work ... or I quit with next next()'
do_job(gen_co, job1)  # prints out all results from job
# 1
# 1.4142135623730951
# 1.7320508075688772
# 2.0
# send me work... or I quit with next next()
do_job(gen_co, job2)  # send another job into generator
# 3
# 4
# 5
# 6
# send me work... or I quit with next next()
next(gen_co)
# Traceback ...
# StopIteration: no job left

To come back to your question which version is the better approach in general.回到你的问题,一般来说哪个版本是更好的方法。 IMO something like gen_factory makes only sense if you need the same thing done for multiple generators you are going to create, or in cases your construction process for generators is complicated enough to justify use of a factory instead of building individual generators in place with a generator comprehension. IMO 之类的gen_factory东西只有在您需要为将要创建的多个发电机做同样的事情时才有意义,或者在您的发电机构建过程足够复杂以证明使用工厂而不是使用发电机就地构建单个发电机的情况下才有意义理解。

Note:笔记:

The description above for the gen_generator function (second mymap ) states "it is a generator itself".上面对gen_generator函数(第二个mymap )的描述指出“它本身就是一个生成器”。 That is a bit vague and technically not really correct, but facilitates reasoning about the differences of the functions in this tricky setup where gen_factory also returns a generator, namely that one built by the generator comprehension inside.这有点含糊,从技术上讲也不太正确,但有助于在这个棘手的设置中推理函数的差异,其中gen_factory还返回一个生成器,即由生成器理解构建的生成器。

In fact any function (not only those from this question with generator comprehensions inside!) with a yield inside, upon invocation, just returns a generator object which gets constructed out of the function body.事实上,任何函数(不仅仅是来自这个问题的那些带有生成器推导式的函数!)在调用时内部有一个yield ,只返回一个生成器对象,该对象从函数体中构造出来。

type(gen_coroutine) # function
gen_co = gen_coroutine(); type(gen_co) # generator

So the whole action we observed above for gen_generator and gen_coroutine takes place within these generator objects, functions with yield inside have spit out before.所以我们在上面观察到的gen_generatorgen_coroutine的整个动作发生在这些生成器对象中,里面有yield函数之前已经吐出来了。

The answer is: return a generator.答案是:返回一个生成器。 It's more fast:它更快:

marco@buzz:~$ python3.9 -m pyperf timeit --rigorous --affinity 3 --value 6 --loops=4096 -s '
a = range(1000)

def f1():
    for x in a:
        yield x

def f2():
    return f1()

' 'tuple(f2())'
........................................
Mean +- std dev: 72.8 us +- 5.8 us
marco@buzz:~$ python3.9 -m pyperf timeit --rigorous --affinity 3 --value 6 --loops=4096 -s '
a = range(1000)

def f1():
    for x in a:
        yield x

def f2():
    yield from f1()

' 'tuple(f2())'
........................................
WARNING: the benchmark result may be unstable
* the standard deviation (12.6 us) is 10% of the mean (121 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3.9 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 121 us +- 13 us

If you read PEP 380 , the main reason for the introduction of yield from is to use a part of the code of a generator for another generator, without having to duplicate the code or change the API:如果你阅读PEP 380 ,引入yield from主要原因是将一个生成器的一部分代码用于另一个生成器,而不必复制代码或更改 API:

The rationale behind most of the semantics presented above stems from the desire to be able to refactor generator code.上面介绍的大多数语义背后的基本原理源于能够重构生成器代码的愿望。 It should be possible to take a section of code containing one or more yield expressions, move it into a separate function (using the usual techniques to deal with references to variables in the surrounding scope, etc.), and call the new function using a yield from expression.应该可以将包含一个或多个 yield 表达式的一段代码移到一个单独的函数中(使用通常的技术来处理对周围范围内变量的引用等),并使用从表达中产生。

Source 来源

The most important difference (I don't know if yield from generator is optimized) is that the context is different for return and yield from .最重要的区别(我不知道yield from generator是否优化)是returnyield from的上下文不同。


[ins] In [1]: def generator():
         ...:     yield 1
         ...:     raise Exception
         ...:

[ins] In [2]: def use_generator():
         ...:     return generator()
         ...:

[ins] In [3]: def yield_generator():
         ...:     yield from generator()
         ...:

[ins] In [4]: g = use_generator()

[ins] In [5]: next(g); next(g)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-5-3d9500a8db9f> in <module>
----> 1 next(g); next(g)

<ipython-input-1-b4cc4538f589> in generator()
      1 def generator():
      2     yield 1
----> 3     raise Exception
      4

Exception:

[ins] In [6]: g = yield_generator()

[ins] In [7]: next(g); next(g)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-3d9500a8db9f> in <module>
----> 1 next(g); next(g)

<ipython-input-3-3ab40ecc32f5> in yield_generator()
      1 def yield_generator():
----> 2     yield from generator()
      3

<ipython-input-1-b4cc4538f589> in generator()
      1 def generator():
      2     yield 1
----> 3     raise Exception
      4

Exception:

Generators use yield , functions use return .生成器使用yield函数使用return

Generators are generally used in for loops for repeatedly iterating over the values automatically provided by a generator , but may be used also in another context, eg in list() function to create list - again from values automatically provided by a generator .生成器通常用在for循环中,用于重复迭代生成器自动提供的值,但也可以在其他上下文中使用,例如在list()函数中创建列表 - 再次从生成器自动提供的值。

Functions are called to provide return value , only one value for every call.函数被调用以提供返回值,每次调用只有一个值。

I prefer the version with yield from because it makes it easier to handle exceptions and context managers.我更喜欢带有yield from的版本,因为它可以更轻松地处理异常和上下文管理器。

Take the example of a generator expression for the lines of a file:以文件行的生成器表达式为例:

def with_return(some_file):
    with open(some_file, 'rt') as f:
        return (line.strip() for line in f)

for line in with_return('/tmp/some_file.txt'):
    print(line)

The return version raises a ValueError: I/O operation on closed file. return版本引发ValueError: I/O operation on closed file. since the file is not open anymore after the return statement.因为在return语句之后文件不再打开。

On the other hand, the yield from version works as expected:另一方面,版本的yield from按预期工作:

def with_yield_from(some_file):
    with open(some_file, 'rt') as f:
        yield from (line.strip() for line in f)


for line in with_yield_from('/tmp/some_file.txt'):
    print(line)

Really it depends on the situation.真的要视情况而定。 yield is mainly suited to cases where you just want to iterate over the returned values and then manipulate them. yield主要适用于您只想迭代返回值然后操作它们的情况。 return is mainly suited for when you want to store all of the values that your function has generated in memory rather than just iterate over them once. return主要适用于您希望将函数生成的所有值存储在内存中而不是仅迭代一次的情况。 Do note that you can only iterate over a generator (what yield returns) once, there are some algorithms which this is definitely not suited for.请注意,您只能迭代生成器(收益返回)一次,有些算法绝对不适合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM