简体   繁体   中英

Definining `fac` with generators. And: Why no stack overflow with generators?

Is there a way we can define the following code (a classic example for recursion) via generators in Python? I am using Python 3.

def fac(n):
    if n==0:
        return 1
    else:
        return n * fac(n-1)

I tried this, no success:

In [1]: def fib(n):
   ...:     if n == 0:
   ...:         yield 1
   ...:     else:
   ...:         n * yield (n-1)
  File "<ipython-input-1-bb0068f2d061>", line 5
    n * yield (n-1)
            ^
SyntaxError: invalid syntax

Classic recursion in Python leads to Stack Overflow

This classic example leads to a stack overflow on my machine for an input of n=3000 . In the Lisp dialect "Scheme" I'd use tail recursion and avoid stack overflow. Not possible in Python. That's why generators come in handy in Python. But I wonder:

Why no stack overflow with generators?

Why is there no stack overflow with generators in Python? How do they work internally? Doing some research leads me always to examples showing how generators are used in Python, but not much about the inner workings.

Update 1: yield from my_function(...)

As I tried to explain in the comments secion, maybe my example above was a poor choice for making a point. My actual question was targeted at the inner workings of generators used recursively in yield from statements in Python 3.

Below is an (incomplete) example code that I use to proces JSON files generatred by Firebox bookmark backups. At several points I use yield from process_json(...) to recursively call the function again via generators.

Exactly in this example, how is stack overflow avoided? Or is it?


# (snip)

FOLDERS_AND_BOOKMARKS = {}
FOLDERS_DATES = {}

def process_json(json_input, folder_path=""):
    global FOLDERS_AND_BOOKMARKS
    # Process the json with a generator
    # (to avoid recursion use generators)
    # https://stackoverflow.com/a/39016088/5115219

    # Is node a dict?
    if isinstance(json_input, dict):
        # we have a dict
        guid = json_input['guid']
        title = json_input['title']
        idx = json_input['index']
        date_added = to_datetime_applescript(json_input['dateAdded'])
        last_modified = to_datetime_applescript(json_input['lastModified'])

        # do we have a container or a bookmark?
        #
        # is there a "uri" in the dict?
        #    if not, we have a container
        if "uri" in json_input.keys():
            uri = json_input['uri']
            # return URL with folder or container (= prev_title)
            # bookmark = [guid, title, idx, uri, date_added, last_modified]
            bookmark = {'title': title,
                        'uri':   uri,
                        'date_added': date_added,
                        'last_modified': last_modified}
            FOLDERS_AND_BOOKMARKS[folder_path].append(bookmark)
            yield bookmark

        elif "children" in json_input.keys():
            # So we have a container (aka folder).
            #
            # Create a new folder
            if title != "": # we are not at the root
                folder_path = f"{folder_path}/{title}"
                if folder_path in FOLDERS_AND_BOOKMARKS:
                    pass
                else:
                    FOLDERS_AND_BOOKMARKS[folder_path] = []
                    FOLDERS_DATES[folder_path] = {'date_added': date_added, 'last_modified': last_modified}

            # run process_json on list of children
            # json_input['children'] : list of dicts
            yield from process_json(json_input['children'], folder_path)

    # Or is node a list of dicts?
    elif isinstance(json_input, list):
        # Process children of container.
        dict_list = json_input
        for d in dict_list:
            yield from process_json(d, folder_path)

Update 2: yield vs yield from

Ok, I get it. Thanks to all the comments.

  • So generators via yield create iterators. That has nothing to do with recursion, so no stack overflow here.
  • But generators via yield from my_function(...) are indeed recursive calls of my function, albeit delayed, and only evaluated if demanded.

This second example can indeed cause a stack overflow.

OK, after your comments I have completely rewritten my answer.

  1. How does recursion work and why do we get a stack overflow?

Recursion is often an elegant way to solve a problem. In most programming languages, every time you call a function, all the information and state needed for the function a put on the stack - a so called "stack frame". The stack is a special per-thread memory region and limited in size.

Now recursive functions implicitly use these stack frames to store state/intermediate results. Eg, the factorial function is n * (n-1) * ((n-1) -1)... 1 and all these "n-1" are stored on the stack.

An iterative solution has to store these intermediate results explicitly in a variable (that often sits in a single stack frame).

  1. How do generators avoid stack overflow?

Simply: They are not recursive. They are implemented like iterator objects. They store the current state of the computation and return a new result every time you request it (implicitly or with next()).

If it looks recursive, that's just syntactic sugar. "Yield" is not like return. It yields the current value and then "pauses" the computation. That's all wrapped up in one object and not in a gazillion stack frames.

This will give you a series from ´1 to n:´:

def fac(n):
    if (n <= 0):
        yield 1
    else:
        v = 1
        for i in range(1, n+1):
            v = v * i
            yield v

There is no recursion, the intermediate results are stored in v which is most likely stored in one object (on the heap, probably).

  1. What about yield from

OK, that's interesting, since that was only added in Python 3.3. yield from can be used to delegate to another generator.

You gave an example like:

def process_json(json_input, folder_path=""):
    # Some code
    yield from process_json(json_input['children'], folder_path)

This looks recursive, but instead it's a combination of two generator objects. You have your "inner" generator (which only uses the space of one object) and with yield from you say "I'd like to forward all the values from that generator to my caller".

So it doesn't generate one stack frame per generator result, instead it creates one object per generator used.

In this example, you are creating one generator object per child JSON-object. That would probably be the same number of stack frames needed if you did it recursively. You won't see a stack overflow though, because objects are allocated on the heap and you have a very different size limit there - depending on your operating system and settings. On my laptop, using Ubuntu Linux, ulimit -s gives me 8 MB for the default stack size, while my process memory size is unlimited (although I have only 8GB of physical memory).

Look at this documentation page on generators: https://wiki.python.org/moin/Generators

And this QA: Understanding generators in Python

Some nice examples, also for yield from : https://www.python-course.eu/python3_generators.php

TL;DR: Generators are objects, they don't use recursion. Not even yield from , which just delegates to another generator object. Recursion is only practical when the number of calls is bounded and small, or your compiler supports tail call optimization.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM