简体   繁体   中英

Why yield is required for python generator?

After reading answer1 and answer2 , purpose of yield still looks unclear.


In this first case, with the below function,

def createGenerator():
   mylist = range(3)
   for i in mylist:
      yield i*i

On invoking createGenerator , below,

myGenerator = createGenerator()

should return object(like (x*x for x in range(3)) ) of type collections.abc.Generator type, is-a collections.abc.Iterator & collections.abc.Iterable

To iterate over myGenerator object and get first value( 0 ),

next(myGenerator)

would actually make for loop of createGenerator function to internally invoke __iter__(myGenerator) and retrieve collections.abc.Iterator type object( obj (say) ) and then invoke __next__(obj) to get first value( 0 ) followed by the pause of for loop using yield keyword


If this understanding(above) is correct, then,

then, does the below syntax(second case),

def createGenerator():
   return (x*x for x in range(3))
myGen = createGenerator() # returns collections.abc.Generator type object
next(myGen) # next() must internally  invoke __next__(__iter__(myGen)) to provide first value(0) and no need to pause

wouldn't suffice to serve the same purpose(above) and looks more readable? Aren't both syntax memory efficient? If yes, then, when should I use yield keyword? Is there a case, where yield could be a must use?

Try doing this without yield

def func():
    x = 1
    while 1:
        y = yield x
        x += y


f = func()
f.next()  # Returns 1
f.send(3)  # returns 4
f.send(10)  # returns 14

The generator has two important features:

  1. The generator some state (the value of x ). Because of this state, this generator could eventually return any number of results without using huge amounts of memory.

  2. Because of the state and the yield , we can provide the generator with information that it uses to compute its next output. That value is assigned to y when we call send .

I don't think this is possible without yield . That said, I'm pretty sure that anything you can do with a generator function can also be done with a class.

Here's an example of a class that does exactly the same thing (python 2 syntax):

class MyGenerator(object):
    def __init__(self):
        self.x = 1

    def next(self):
        return self.x

    def send(self, y):
        self.x += y
        return self.next()

I didn't implement __iter__ but it's pretty obvious how that should work.

Think of yield as a "lazy return". In your second example, your function does not return a "generator of values", but rather a fully evaluated list of values. This may be perfectly acceptable depending on the use case. Yield is useful when proccessing large batches of streamed data, or when dealing with data that is not immediately available (think asynchronous operations).

The generator function and the generator comprehension are basically same - both produce generator objects:

In [540]: def createGenerator(n):
     ...:     mylist = range(n)
     ...:     for i in mylist:
     ...:         yield i*i
     ...:         
In [541]: g = createGenerator(3)
In [542]: g
Out[542]: <generator object createGenerator at 0xa6b2180c>

In [545]: gl = (i*i for i in range(3))
In [546]: gl
Out[546]: <generator object <genexpr> at 0xa6bbbd7c>

In [547]: list(g)
Out[547]: [0, 1, 4]
In [548]: list(gl)
Out[548]: [0, 1, 4]

Both g and gl have the same attributes; produce the same values; run out in the same way.

Just as with a list comprehension, there are things you can do in the explicit loop that you can't with the comprehension. But if the comprehension does the job, use it. Generators were added to Python sometime around version 2.2. Generator comprehensions are newer (and probably use the same underlying mechanism).

In Py3 range , or Py2 xrange produces values one at a time, as opposed to a whole list. It's a range object, not a generator, but works in much the same way. Py3 has extended this in other ways, such as the dictionary keys and map . Sometimes that's a convenience, other times I forget to wrap them in the list() .


The yield can be more elaborate, allowing 'feedback' for the caller. eg

In [564]: def foo(n):
     ...:     i = 0
     ...:     while i<n:
     ...:         x = yield i*i
     ...:         if x is None:
     ...:             i += 1
     ...:         else:
     ...:             i = x
     ...:             

In [576]: f = foo(3)
In [577]: next(f)
Out[577]: 0
In [578]: f.send(-3)    # reset the counter
Out[578]: 9
In [579]: list(f)
Out[579]: [4, 1, 0, 1, 4]

The way I think of an generator operating is that creation initializes an object with code and initial state. next() runs it up to the yield , and returns that value. The next next() lets it spin again until it hits a yield , and so on until it hits a stop iteration condition. So it's a function that maintains an internal state, and can called repeatedly with the next or for iteration. With send and yield from and so on generators can be much more sophisticated.

Normally a function runs until done, and returns. The next call to the function is independent of the first - unless you use globals or error prone defaults.


https://www.python.org/dev/peps/pep-0289/ is the PEP for generator expressions, from v 2.4.

This PEP introduces generator expressions as a high performance, memory efficient generalization of list comprehensions [1] and generators [2] .

https://www.python.org/dev/peps/pep-0255/ PEP for generators, v.2.2

There already is a good answer about the capability to send data into a generator with yield. Regarding just readability considerations, while certainly simple, straightforward transformations can be more readable as generator expressions:

(x + 1 for x in iterable if x%2 == 1)

Certain operations are easier to read and understand using a full generator definition. Certain cases are a headache to fit into a generator expression, try the following:

>>> x = ['arbitrarily', ['nested', ['data'], 'can', [['be'], 'hard'], 'to'], 'reach']
>>> def flatten_list_of_list(lol):
...     for l in lol:
...         if isinstance(l, list):
...             yield from flatten_list_of_list(l)
...         else:
...             yield l
...
>>> list(flatten_list_of_list(x))
['arbitrarily', 'nested', 'data', 'can', 'be', 'hard', 'to', 'reach']

Sure, you might be able to hack up a solution that fits on a single line using lambda s to achieve recursion, but it will be an unreadable mess. Now imagine I had some arbitrarily nested data-structure that involved list and dict , and I have logic to handle both cases... you get the point I think.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM