简体   繁体   English

你如何迭代树?

[英]How do you iterate over a tree?

What is your preferred method of traversing a tree data structure, since recursive method calls can be pretty inefficient in some circumstances. 遍历树数据结构的首选方法是什么,因为在某些情况下递归方法调用可能效率很低。 I am simply using a generator like the one above. 我只是使用像上面那样的发电机。 Do you have any hints to make it faster? 你有什么提示让它更快吗?

def children(self):
    stack = [self.entities]
    while stack: 
        for e in stack.pop():
            yield e
            if e.entities:
                stack.append(e.entities) 

Here is some test data. 这是一些测试数据。 The first one is recursive, the second uses the generator: 第一个是递归的,第二个使用生成器:

s = time.time()
for i in range(100000):
    e.inc_counter()
print time.time() - s

s = time.time()
for i in range(100000):
    for e in e.children():
        e.inc_counter_s()
print time.time() - s

Results: 结果:

0.416000127792
0.298999786377

Test code: 测试代码:

import random

class Entity():
    def __init__(self, name):
        self.entities = []
        self.name = name
        self.counter = 1
        self.depth = 0

    def add_entity(self, e):
        e.depth = self.depth + 1
        self.entities.append(e)

    def inc_counter_r(self):
        for e in self.entities:
            e.counter += 1
            e.inc_counter_r()

    def children(self):
        stack = [self.entities]
        while stack:
            for e in stack.pop():
                yield e
                if e.entities:
                    stack.append(e.entities)

root = Entity("main")
def fill_node(root, max_depth):
    if root.depth <= max_depth:
        for i in range(random.randint(10, 15)):
            e = Entity("node_%s_%s" % (root.depth, i))
            root.add_entity(e)
            fill_node(e, max_depth)
fill_node(root, 3)

import time
s = time.time()
for i in range(100):
    root.inc_counter_r()
print "recursive:", time.time() - s

s = time.time()
for i in range(100):
    for e in root.children():
        e.counter += 1
print "generator:",  time.time() - s

Unless your tree is really large or you have really high (real) requirements for speed, I would choose the recursive method. 除非你的树真的很大或你对速度有很高的(实际)要求,否则我会选择递归方法。 Easier to read, easier to code. 更易于阅读,更易于编码。

I can't think of any big algorithmic improvements, but a simple microoptimisation you can make is to bind frequently called methods (such as stack.append / stack.pop) to locals (this saves a dictionary lookup) 我想不出任何大的算法改进,但是你可以做的一个简单的微优化是将经常调用的方法(例如stack.append / stack.pop)绑定到本地(这节省了字典查找)

def children(self):
    stack = [self.entities]
    push = stack.append
    pop = stack.pop
    while stack: 
        for e in pop():
            yield e
            if e.entities:
                push(e.entities)

This gives a small (~15%) speedup by my tests (using 100 traversals of an 8-deep tree with 4 children at each node gives me the below timings:) 这通过我的测试给出了一个小的(~15%)加速(使用100个遍历的8个深度树,每个节点有4个孩子给我下面的时间:)

children     :  5.53942348004
children_bind:  4.77636131253

Not huge, but worth doing if speed is important. 不是很大,但如果速度很重要,那就值得做。

I'm not sure if you can reduce the overhead much on a full in-order traversal of a tree, if you use recursion the call stack will grow some, otherwise you must manually use a stack to push references of the children while visiting each node. 我不确定你是否可以在树的完整有序遍历上减少开销,如果你使用递归调用堆栈会增长一些,否则你必须手动使用堆栈来推送子进程的参考节点。 Which way is fastest and uses less memory, depends on the expensiveness of the call stack vs. a normal stack. 哪种方式最快并且使用更少的内存,取决于调用堆栈与普通堆栈的昂贵程度。 (I would guess the callstack is faster since it should be optimized for its use, and recursion is much easier to implement) (我猜测callstack更快,因为它应该针对其使用进行优化,并且递归更容易实现)

If you don't care about the order you visit the nodes, some implementations of trees is actually stored in a dynamic array or linked list or stack wich you can traverse linearly if you don't care about the order it's traversed. 如果您不关心访问节点的顺序,树的某些实现实际上存储在动态数组或链表或堆栈中,如果您不关心它遍历的顺序,您可以线性遍历。

But why is it important to have a fast traversal anyway? 但是为什么有一个快速遍历很重要呢? Trees are good for searching, arrays/linked lists is good for full traversal. 树很适合搜索,数组/链表有利于完全遍历。 If you often need full in-order traversal but few searches and insertions/deletions, an ordered linked list might be best, if searching is what you do most you use a tree. 如果您经常需要完整的有序遍历,但搜索和插入/删除很少,那么有序链表可能是最好的,如果搜索是您最常使用的树。 If the data is really massive, so that memory overhead may render recursion impossible, you should use a database. 如果数据非常庞大,那么内存开销可能导致递归变得不可能,那么您应该使用数据库。

Recursive function calls are not incredibly inefficient, that is an old programming myth. 递归函数调用并不是非常低效,这是一个古老的编程神话。 (If they're badly implemented, they may incur a larger overhead than necessary, but calling them "incredibly inefficient" is plain wrong.) (如果它们执行得很糟糕,它们可能会产生比必要的更大的开销,但称它们“极其低效”是完全错误的。)

Remember: don't optimize prematurely, and never optimize without benchmarking first. 请记住:不要过早优化, 不要在没有基准测试的情况下进行优化。

If you have a lot of RAM and the tree doesn't change often, you can cache the result of the call: 如果你有大量的RAM并且树不经常更改,你可以缓存调用的结果:

def children(self):
    if self._children_cache is not None:
        return self._children_cache
    # Put your code into collectChildren()
    self._children_cache = self.collectChildren()
    return self._children_cache

Whenever the tree changes, set the cache to None. 每当树发生更改时,将缓存设置为“无”。 In this case, using recursive calls might be more effective since the results will accumulate faster. 在这种情况下,使用递归调用可能更有效,因为结果将更快地累积。

我过去编写了迭代树遍历代码:它非常难看,而且速度不快,除非你确切知道每个子树不仅有多少个子,而且知道有多少个级别。

I don't know too much about Python internals of function calls, but I really can't imagine that your code snippet is faster than recursively traversing the tree. 我不太了解函数调用的Python内部,但我真的无法想象你的代码片段比递归遍历树更快。

The call stack (used for function calls, including recursive ones) is typically very fast. 调用堆栈(用于函数调用,包括递归调用)通常非常快。 Going to the next object will only cost you a single function call. 转到下一个对象只需要花费一个函数调用。 But in your snippet - where you use a stack object, going to the next object will cost you a stack.append (possibly allocating memory on heap), a stack.push (possibly freeing memory from heap), and a yield. 但是在你的代码片段中 - 你使用堆栈对象,转到下一个对象将花费你stack.append(可能在堆上分配内存),stack.push(可能从堆中释放内存)和yield。

The main problem with recursive calls is that you might blow the stack if your tree gets too deep. 递归调用的主要问题是如果树太深,你可能会破坏堆栈。 This isn't likely to happen. 这不太可能发生。

Here's a pair of small corrections. 这是一对小修正。

def children(self):
    stack = [self.entities]
    for e in stack:
        yield e
        if e.entities:
            stack.extend(e.entities)

I actually think the generator, using append, isn't visiting all the nodes. 我实际上认为使用append的生成器不访问所有节点。 I think you mean to extend the stack with all entities, not append a simple list of entities to the stack. 我认为你的意思是extend堆栈与所有实体,而不是append简单的实体列表append到堆栈。

Also, when the for loop terminates, the while loop in your original example will also terminate because there's no change to the empty stack after the for loop. 此外,当for循环终止时,原始示例中的while循环也将终止,因为for循环后空堆栈没有变化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM