简体   繁体   English

用生成器定义“fac”。 并且:为什么生成器没有堆栈溢出?

[英]Definining `fac` with generators. And: Why no stack overflow with generators?

Is there a way we can define the following code (a classic example for recursion) via generators in Python?有没有一种方法可以通过 Python 中的生成器定义以下代码(递归的经典示例)? I am using Python 3.我正在使用 Python 3。

def fac(n):
    if n==0:
        return 1
    else:
        return n * fac(n-1)

I tried this, no success:我试过这个,没有成功:

In [1]: def fib(n):
   ...:     if n == 0:
   ...:         yield 1
   ...:     else:
   ...:         n * yield (n-1)
  File "<ipython-input-1-bb0068f2d061>", line 5
    n * yield (n-1)
            ^
SyntaxError: invalid syntax

Classic recursion in Python leads to Stack Overflow Python 中的经典递归导致堆栈溢出

This classic example leads to a stack overflow on my machine for an input of n=3000 .对于n=3000的输入,这个经典示例在我的机器上导致堆栈溢出。 In the Lisp dialect "Scheme" I'd use tail recursion and avoid stack overflow.在 Lisp 方言“方案”中,我会使用尾递归并避免堆栈溢出。 Not possible in Python.在 Python 中不可能。 That's why generators come in handy in Python.这就是生成器在 Python 中派上用场的原因。 But I wonder:但我想知道:

Why no stack overflow with generators?为什么生成器没有堆栈溢出?

Why is there no stack overflow with generators in Python?为什么 Python 中的生成器没有堆栈溢出? How do they work internally?他们如何在内部工作? Doing some research leads me always to examples showing how generators are used in Python, but not much about the inner workings.做一些研究总是让我看到一些例子,展示如何在 Python 中使用生成器,但对内部工作的了解不多。

Update 1: yield from my_function(...)更新 1: yield from my_function(...)

As I tried to explain in the comments secion, maybe my example above was a poor choice for making a point.正如我试图在评论部分解释的那样,也许我上面的例子是一个糟糕的选择。 My actual question was targeted at the inner workings of generators used recursively in yield from statements in Python 3.我的实际问题是针对在 Python 3 中yield from语句中递归使用的生成器的内部工作原理。

Below is an (incomplete) example code that I use to proces JSON files generatred by Firebox bookmark backups.下面是一个(不完整的)示例代码,我用来处理由 Firebox 书签备份生成的 JSON 文件。 At several points I use yield from process_json(...) to recursively call the function again via generators.在几个点上,我使用yield from process_json(...)的 yield 通过生成器再次递归调用 function。

Exactly in this example, how is stack overflow avoided?正是在这个例子中,如何避免堆栈溢出? Or is it?或者是吗?


# (snip)

FOLDERS_AND_BOOKMARKS = {}
FOLDERS_DATES = {}

def process_json(json_input, folder_path=""):
    global FOLDERS_AND_BOOKMARKS
    # Process the json with a generator
    # (to avoid recursion use generators)
    # https://stackoverflow.com/a/39016088/5115219

    # Is node a dict?
    if isinstance(json_input, dict):
        # we have a dict
        guid = json_input['guid']
        title = json_input['title']
        idx = json_input['index']
        date_added = to_datetime_applescript(json_input['dateAdded'])
        last_modified = to_datetime_applescript(json_input['lastModified'])

        # do we have a container or a bookmark?
        #
        # is there a "uri" in the dict?
        #    if not, we have a container
        if "uri" in json_input.keys():
            uri = json_input['uri']
            # return URL with folder or container (= prev_title)
            # bookmark = [guid, title, idx, uri, date_added, last_modified]
            bookmark = {'title': title,
                        'uri':   uri,
                        'date_added': date_added,
                        'last_modified': last_modified}
            FOLDERS_AND_BOOKMARKS[folder_path].append(bookmark)
            yield bookmark

        elif "children" in json_input.keys():
            # So we have a container (aka folder).
            #
            # Create a new folder
            if title != "": # we are not at the root
                folder_path = f"{folder_path}/{title}"
                if folder_path in FOLDERS_AND_BOOKMARKS:
                    pass
                else:
                    FOLDERS_AND_BOOKMARKS[folder_path] = []
                    FOLDERS_DATES[folder_path] = {'date_added': date_added, 'last_modified': last_modified}

            # run process_json on list of children
            # json_input['children'] : list of dicts
            yield from process_json(json_input['children'], folder_path)

    # Or is node a list of dicts?
    elif isinstance(json_input, list):
        # Process children of container.
        dict_list = json_input
        for d in dict_list:
            yield from process_json(d, folder_path)

Update 2: yield vs yield from更新 2: yieldyield from

Ok, I get it.好的我明白了。 Thanks to all the comments.感谢所有的评论。

  • So generators via yield create iterators.所以生成器通过yield创建迭代器。 That has nothing to do with recursion, so no stack overflow here.这与递归无关,所以这里没有堆栈溢出。
  • But generators via yield from my_function(...) are indeed recursive calls of my function, albeit delayed, and only evaluated if demanded.但是通过yield from my_function(...)生成器确实是我的 function 的递归调用,尽管延迟了,并且仅在需要时才进行评估。

This second example can indeed cause a stack overflow.第二个示例确实会导致堆栈溢出。

OK, after your comments I have completely rewritten my answer.好的,在您发表评论后,我已经完全重写了我的答案。

  1. How does recursion work and why do we get a stack overflow?递归是如何工作的,为什么会出现堆栈溢出?

Recursion is often an elegant way to solve a problem.递归通常是解决问题的一种优雅方式。 In most programming languages, every time you call a function, all the information and state needed for the function a put on the stack - a so called "stack frame".在大多数编程语言中,每次调用 function 时,function 所需的所有信息和 state 都会放入堆栈 - 所谓的“堆栈帧”。 The stack is a special per-thread memory region and limited in size.堆栈是一个特殊的每线程 memory 区域并且大小有限。

Now recursive functions implicitly use these stack frames to store state/intermediate results.现在递归函数隐式使用这些堆栈帧来存储状态/中间结果。 Eg, the factorial function is n * (n-1) * ((n-1) -1)... 1 and all these "n-1" are stored on the stack.例如,阶乘 function 是 n * (n-1) * ((n-1) -1)... 1 并且所有这些“n-1”都存储在堆栈中。

An iterative solution has to store these intermediate results explicitly in a variable (that often sits in a single stack frame).迭代解决方案必须将这些中间结果显式存储在变量中(通常位于单个堆栈帧中)。

  1. How do generators avoid stack overflow?生成器如何避免堆栈溢出?

Simply: They are not recursive.简单地说:它们不是递归的。 They are implemented like iterator objects.它们像迭代器对象一样实现。 They store the current state of the computation and return a new result every time you request it (implicitly or with next()).它们存储计算的当前 state 并在您每次请求时返回一个新结果(隐式或使用 next())。

If it looks recursive, that's just syntactic sugar.如果它看起来是递归的,那只是语法糖。 "Yield" is not like return. “收益”不像回报。 It yields the current value and then "pauses" the computation.它产生当前值,然后“暂停”计算。 That's all wrapped up in one object and not in a gazillion stack frames.这一切都包含在一个 object 中,而不是在数以千计的堆栈帧中。

This will give you a series from ´1 to n:´:这将为您提供从 '1 到 n:' 的系列:

def fac(n):
    if (n <= 0):
        yield 1
    else:
        v = 1
        for i in range(1, n+1):
            v = v * i
            yield v

There is no recursion, the intermediate results are stored in v which is most likely stored in one object (on the heap, probably).没有递归,中间结果存储在v中,它很可能存储在一个 object 中(可能在堆上)。

  1. What about yield from yield from如何

OK, that's interesting, since that was only added in Python 3.3.好的,这很有趣,因为它仅在 Python 3.3 中添加。 yield from can be used to delegate to another generator. yield from可用于委托给另一个生成器。

You gave an example like:你举了一个例子:

def process_json(json_input, folder_path=""):
    # Some code
    yield from process_json(json_input['children'], folder_path)

This looks recursive, but instead it's a combination of two generator objects.这看起来是递归的,但实际上它是两个生成器对象的组合。 You have your "inner" generator (which only uses the space of one object) and with yield from you say "I'd like to forward all the values from that generator to my caller".你有你的“内部”生成器(它只使用一个对象的空间)并且你说“我想将该yield from器中的所有值转发给我的调用者”。

So it doesn't generate one stack frame per generator result, instead it creates one object per generator used.因此,它不会为每个生成器结果生成一个堆栈帧,而是为每个使用的生成器创建一个 object。

In this example, you are creating one generator object per child JSON-object.在此示例中,您将为每个子 JSON 对象创建一个生成器 object。 That would probably be the same number of stack frames needed if you did it recursively.如果您以递归方式执行,那可能需要相同数量的堆栈帧。 You won't see a stack overflow though, because objects are allocated on the heap and you have a very different size limit there - depending on your operating system and settings.但是,您不会看到堆栈溢出,因为对象是在堆上分配的,并且那里的大小限制非常不同 - 取决于您的操作系统和设置。 On my laptop, using Ubuntu Linux, ulimit -s gives me 8 MB for the default stack size, while my process memory size is unlimited (although I have only 8GB of physical memory).在我的笔记本电脑上,使用 Ubuntu Linux, ulimit -s给我 8 MB 的默认堆栈大小,而我的进程 memory 的内存大小是无限的(虽然我只有 8GB 的物理内存)。

Look at this documentation page on generators: https://wiki.python.org/moin/Generators查看有关生成器的文档页面: https://wiki.python.org/moin/Generators

And this QA: Understanding generators in Python还有这个 QA: 了解 Python 中的生成器

Some nice examples, also for yield from : https://www.python-course.eu/python3_generators.php一些很好的例子,也yield fromhttps://www.python-course.eu/python3_generators.php

TL;DR: Generators are objects, they don't use recursion. TL;DR:生成器是对象,它们不使用递归。 Not even yield from , which just delegates to another generator object.甚至没有yield from ,它只是委托给另一个生成器 object。 Recursion is only practical when the number of calls is bounded and small, or your compiler supports tail call optimization.递归仅在调用数量有限且很小,或者您的编译器支持尾调用优化时才实用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM