简体   繁体   中英

How does python "nonlocal" keyword implement? Can it save space?

Now I am thinking of writing a code to get maximum depth of a binarytree. Python can't pass non-container parameter by reference, so there are generally two choices, use nonlocal keyword or pass depth by copy.

The first one get_max_depth1 need more traceback operation, I wonder whether it costs less space compare to get_max_depth1 . If python implement nonlocal use parameter pass by reference, then every function also bring an integer pointer, in this case, it is inferior to get_max_depth2 , because it's harder to write, slower to run, and save almost no space. If not, when the binary tree depth is 100000, get_max_depth1 only need one variable, get_max_depth2 need 100000 variable d saved in their function, I guess it's meaningful to write d outside.

def main():
    root = BinaryTreeNode()
    d = 0
    maxd1 = 0
    def get_max_depth1(node):
        nonlocal d,maxd1
        maxd1 = max(maxd1, d)
        if node.left:
            d += 1
            get_max_depth1(node.left)
            d -= 1
        if node.right:
            d += 1
            get_max_depth1(node.right)
            d -= 1

    get_max_depth1(root)

    maxd2 = 0
    def get_max_depth2(node, d):
        nonlocal maxd2
        maxd2 = max(maxd2, d)

        if node.left:
            get_max_depth2(node.left, d+1)
        if node.right:
            get_max_depth2(node.right, d+1)

    get_max_depth2(root, 0)

If you want to find out exactly what the allocation difference is, then running the app with https://pypi.org/project/memory-profiler/ should give you the answers you're after. But that only applies to the very theoretical side and the result will be specific to one CPython version and may not hold overall.

In practice the answer is: they're about the same for a few reasons:

  • Other code will dominate the performance (just creating new stack frames and would take more space)
  • You can't go that deep anyway (default recursion limit is 1000 frames)
  • Your data will not be that large (binary trees are usually kept balanced which at 100000 levels it would give you over 10^30102 elements)
  • Once you start caring about single bytes and limits like that, CPython stops being the right answer. Cython may be the simplest next step and there you can check the exact heap/stack usage of the resulting C program.

Python's data model defines User-defined functions as having a __closure__ attribute, which reify the function closure : the readable/writeable values of enclosing scopes. __closure__ is None or a tuple of cells .
The standard currently (3.10) only defines that a cell has a cell_contents attribute to represent the value of the cell. The CellType makes no guarantees what it provides.

Notably, whether a cell is writeable is not determined by whether a function captures the closure as readable (bare usage) or readable/writeable ( nonlocal declaration). Both are the same kind of cell.


In practice, CPython¹ represents __closure__ as a regular tuple and each cell as a 40 byte object that holds a reference to its value.

>>> def outer(a = 3):
...     def inner():
...         print(a)  # `a` is captured from outer scope
...     return inner
>>> outer()
<function __main__.outer.<locals>.inner()>
>>> outer().__closure__
(<cell at 0x10eac2ca0: int object at 0x109f26d30>,)
>>> outer().__closure__[0].cell_contents
3
>>> # size of the closure tuple and cell in bytes
>>> sys.getsizeof(outer().__closure__), sys.getsizeof(outer().__closure__[0])
(48, 40)

The __closure__ itself belongs to the function object, whereas a cell is shared between all functions closing over the same variable.

In contrast, a local variable is stored as an array of references – each 8 byte. The local storage belongs to the function call , so calling a function multiple times also creates multiple such references.

For reference, just the shell of the above inner function object is 136 bytes. Its name and fully-qualified name are 54 and 69 bytes, respectively. Its bytecode is 45 bytes. There are many additional costs for things that you likely do not even know exist.
Keep that in mind when trying to safe individual chunks of 8 bytes.


¹CPython 3.8.12 [Clang 11.0.0 (clang-1100.0.33.17)], 64 bit build.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM