简体   繁体   English

python迭代器是否需要额外的内存?

[英]Does python iterator cost additional memory?

I wonder if my_iter = iter(iterable_obj) copy the iterable_obj ?我想知道my_iter = iter(iterable_obj)复制了iterable_obj In other words, does the above call cost additional memory?换句话说,上面的调用是否需要额外的内存?

Does it copy?它复制吗? Maybe.也许。 But shouldn't.但不应该。

It could copy, but it shouldn't .可以复制,但不应该 It should just provide iteration over the existing data structure with minimal memory overhead as necessary for the iteration.它应该只在现有数据结构上提供迭代,并根据迭代所需的最小内存开销。 For example a list iterator only stores a reference to the list as well as an index.例如, list迭代器仅存储对列表的引用和索引。

What does it do?它有什么作用

What does it do?它有什么作用 That depends.那要看。 The iter function shall provide an iterator over any possible iterable in the whole wide world, including an iterable class you'll only write tomorrow, with complicated internal data structure. iter函数将提供一个迭代器,覆盖整个世界中任何可能的可迭代对象,包括一个你明天只会编写的具有复杂内部数据结构的可迭代类。 How can iter possible do that? iter怎么可能做到这一点? Artificial intelligence?人工智能? Magic?魔法? No. Well ... actually yes, magic.不。嗯……实际上是的,魔法。 Namely with so-called "magic methods" (or "dunder methods").即用所谓的“魔法方法”(或“dunder方法”)。 In this case, __iter__ or __getitem__ .在这种情况下, __iter____getitem__ The trick is, iter doesn't know how to iterate the iterable.诀窍是, iter知道如何迭代可迭代对象。 The iterable does.可迭代的 And makes the iteration accessible with one of those two magic methods.并使用这两种魔术方法之一来访问迭代。 The iter function is just a simple middle man between the code that calls it (which wants the iteration) and the iterable (which provides the iteration).iter功能只是调用它的代码(希望迭代)和迭代(它提供了迭代)之间简单的中间人。

Example with an __iter__ method returning an iterator: __iter__方法返回迭代器的示例:

class MyIterable:
    def __iter__(self):
        return iter('abcde')

print(list(MyIterable()))

Output:输出:

['a', 'b', 'c', 'd', 'e']

Example with a __getitem__ method returning elements for indexes 0, 1, 2, etc (until IndexError ):使用__getitem__方法返回索引 0、1、2 等元素的示例(直到IndexError ):

class MyIterable:
    def __getitem__(self, index):
        return 'abcde'[index]

print(list(MyIterable()))

Output:输出:

['a', 'b', 'c', 'd', 'e']

So what does iter(iterable) do?那么,是什么iter(iterable)吗? Depends on what the iterable does.取决于迭代器的作用。 It might copy, it might not, it might try to set your house on fire.它可能会复制,也可能不会,它可能会试图让你的房子着火。

For something as simple as a list iterator, the choice is obvious: Using a reference to the list and an index to where the iterator stands is both simple and efficient.对于像列表迭代器这样简单的东西,选择是显而易见的:使用对列表的引用和迭代器所在位置的索引既简单又高效。

More interesting case: Binary search tree iterator更有趣的案例:二叉搜索树迭代器

Let's consider a case where it's not so obvious and where you might be tempted to copy: A binary search tree iterator that offers iteration over the tree's values in sorted order.让我们考虑一个不那么明显并且您可能复制的情况:一个二叉搜索树迭代器,它提供按排序顺序迭代树的值。 Let's consider three possible implementations, where n is the number of values in the tree.让我们考虑三种可能的实现,其中 n 是树中值的数量。 The tree will be represented as a structure of BST node objects:该树将表示为BST节点对象的结构:

class BST:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

Possible implementation 1: Recursive iterator可能的实现方式一:递归迭代器

class BST:
    ...
    def __iter__(self):
        if self.left:
            yield from self.left
        yield self.value
        if self.right:
            yield from self.right

Advantages:好处:

  • Easy code.简单的代码。
  • O(1) time and space for iterator creation. O(1) 迭代器创建的时间和空间。
  • Lazy, iterates only as much as then requested.懒惰,只迭代请求的次数。
  • Only O(h) memory during iteration, where h is the height of the tree.迭代期间只有 O(h) 内存,其中 h 是树的高度。 Could be as low as Θ(log n) if the tree is balanced, or as high as Θ(n) if it's very unbalanced.如果树是平衡的,则可能低至 Θ(log n),如果树非常不平衡,则可能高至 Θ(n)。

Disadvantages:缺点:

  • Slow.减缓。 Every value gets passed through the entire stack of iterators up to the root.每个值都通过整个迭代器堆栈传递到根。 So iterating the whole tree takes at least Θ(n log n) and up to Θ(n²) time.所以迭代整个树至少需要 Θ(n log n) 和最多 Θ(n²) 的时间。

Possible implementation 2: Copy values into list可能的实现 2:将值复制到列表中

Since slow iteration, especially quadratic time, is seriously disappointing, we could copy all values from the tree into a list and return an iterator over that list:由于缓慢的迭代,尤其是二次时间,非常令人失望,我们可以将树中的所有值复制到一个列表中,并在该列表上返回一个迭代器:

class BST:
    ...
    def __iter__(self):
        values = []
        def collect(node):
            if node:
                collect(node.left)
                values.append(node.value)
                collect(node.right)
        collect(self)
        return iter(values)

Advantages:好处:

  • Easy code.简单的代码。
  • Linear time iteration.线性时间迭代。

Disadvantages:缺点:

  • Θ(n) memory. Θ(n) 记忆。
  • Θ(n) time already for creating the iterator, before even starting actual iteration. Θ(n) 时间已经用于创建迭代器,甚至在开始实际迭代之前。

Possible implementation 3: Iterative可能的实现方式 3:迭代

Here's an iterative one using a stack.这是一个使用堆栈的迭代。 The stack will hold the nodes whose values and whose right subtrees still need to be iterated:堆栈将保存其值和其子树仍需要迭代的节点:

class BST:
    ...
    def __iter__(self):
        node = self
        stack = []
        while node or stack:
            while node:
                stack.append(node)
                node = node.left
            node = stack.pop()
            yield node.value
            node = node.right

Combines the advantages of the first two implementations (it's both time and memory efficient) but at the disadvantage of not being easy.结合了前两种实现的优点(既节省时间又节省内存),但缺点是不容易。 Unlike for the first two implementations, I felt the need to add that little explanation for how it works, and you'll probably still need to think about it a bit if you haven't seen it before.与前两个实现不同,我觉得有必要对它的工作原理添加一点解释,如果你以前没有见过它,你可能仍然需要考虑一下。

Conclusion结论

If it's just a little exercise for you and you don't have efficiency issues, the first two implementations are fine and easy to write.如果这对您来说只是一个小练习,并且您没有效率问题,那么前两个实现很好且易于编写。 Although the one copying the values to a list isn't really a normal iterator, as copying the values is fundamentally not what iteration means.尽管将值复制到列表中并不是真正的普通迭代器,因为复制值从根本上不是迭代的意思。 It's not about the memory, though.不过,这与内存无关。 The recursive generator and the iterative approach take anywhere from O(log n) and O(n) memory as well, but that's organizational data and somewhat necessary to facilitate the iteration.递归生成器和迭代方法也需要 O(log n) 和 O(n) 内存,但这是组织数据,对于促进迭代有些必要。 They're not copying the content data.他们没有复制内容数据。

If it's a BST package for serious use, then I'd find the disadvantages of the first two implementations unacceptable and would use the iterative implementation.如果它是一个认真使用的 BST 包,那么我会发现前两个实现的缺点是不可接受的,并且会使用迭代实现。 More effort to write once, but with advantages and a proper iterator.一次编写更多的努力,但具有优势和适当的迭代器。

Btw if the nodes also had a reference to their parent node, I think an iterator could use that to do efficient iteration with O(1) memory.顺便说一句,如果节点也有对其父节点的引用,我认为迭代器可以使用它来使用 O(1) 内存进行高效迭代。 Left as exercise for the reader :-P留给读者练习:-P

Code代码

BST code to play with ( Try it online! ):可使用的 BST 代码( 在线试用! ):

from random import shuffle

class BST:

    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

    def insert(self, value):
        if value < self.value:
            if self.left:
                self.left.insert(value)
            else:
                self.left = BST(value)
        elif value > self.value:
            if self.right:
                self.right.insert(value)
            else:
                self.right = BST(value)

    def __repr__(self):
        return f'BST({self.value}, {self.left}, {self.right})'

    def __iter__(self):
        yield from self.left or ()
        yield self.value
        yield from self.right or ()

    def __iter__(self):
        values = []
        def collect(node):
            if node:
                collect(node.left)
                values.append(node.value)
                collect(node.right)
        collect(self)
        return iter(values)

    def __iter__(self):
        node = self
        stack = []
        while node or stack:
            while node:
                stack.append(node)
                node = node.left
            node = stack.pop()
            yield node.value
            node = node.right

# Build a random tree
values = list(range(20)) * 2
shuffle(values)
tree = BST(values[0])
for value in values[1:]:
   tree.insert(value)

# Show the tree
print(tree)

# Iterate the tree in sorted order
print(list(tree))

Sample output:示例输出:

BST(1, BST(0, None, None), BST(17, BST(10, BST(6, BST(2, None, BST(4, BST(3, None, None), BST(5, None, None))), BST(9, BST(7, None, BST(8, None, None)), None)), BST(15, BST(11, None, BST(13, BST(12, None, None), BST(14, None, None))), BST(16, None, None))), BST(18, None, BST(19, None, None))))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM