Is this a bad use of a `yield` statement?

Question

I was taking a look at the code of a coworker and I felt like this was an unnecessary use of the yield statement. It was something like this:

def standardize_text(text: str):
    pattern = r"ABC" # some regex
    yield re.sub(pattern, "X", text)

def preprocess_docs(docs: List[str]):
    for doc in docs:
        yield standardize_text(doc)

I understand the use of yield in preprocess_docs so that I can return a generator, which would be helpful if docs is a large list. But I don't understand the value of the yield in the standardize_text function. To me, a return statement would do the exact same thing.

Is there a reason why that yield would be useful?

Answer 1

To me, a return statement would do the exact same thing.

Using return instead wouldn't be the same as yield , as explained in ShadowRanger's comment .

With yield , calling the function gives you a generator object :

>>> standardize_text("ABCD")
<generator object standardize_text at 0x10561f740>

Generators can produce more than one result (unlike functions that use return ). This generator happens to produce exactly one item, which is a string (the result of re.sub ). You can collect the generator's results into a list() , for example, or just grab the first result with next() :

>>> list(standardize_text("ABCD"))
['XD']

>>> g = standardize_text("ABCD")
>>> next(g)
'XD'
>>> next(g) # raises StopIteration, indicating the generator has finished

If we change the function to use return :

def standardize_text(text: str):
    pattern = r"ABC" # some regex
    return re.sub(pattern, "X", text)

Then calling the function just gives us the single result only — no list() or next() needed.

>>> standardize_text("ABCD")
'XD'

Is there a reason why that yield would be useful?

In the standardize_text function, no, not really. But your preprocess_docs function actually does make use of returning more than one value with yield : it returns a generator with one result for each of the values in docs . Those results are either generators themselves (in your original code with yield ) or strings (if we change standardize_text to use return ).

def preprocess_docs(docs: List[str]):
    for doc in docs:
        yield standardize_text(doc)

# returns a generator because the implementation uses "yield"
>>> preprocess_docs(["ABCD", "AAABC"])
<generator object preprocess_docs at 0x10561f820>

# with standardize_text using "yield re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
... 
<generator object standardize_text at 0x1056cce40>
<generator object standardize_text at 0x1056cceb0>


# with standardize_text using "return re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
... 
XD
AAX

Note: Prior to Python 3's async / await , some concurrency libraries used yield in the same way that await is now used. For example, Twisted's @inlineCallbacks . I don't think this is directly relevant to your question, but I included it for completeness.

Is this a bad use of a `yield` statement?

Question

1 answers

solution1
1 ACCPTED 2021-06-26 23:40:37

Is this a bad use of a `yield` statement?

Question

1 answers

solution1 1 ACCPTED 2021-06-26 23:40:37

solution1
1 ACCPTED 2021-06-26 23:40:37