简体   繁体   中英

How can I recursively iterate over a tree that is changing during the traversal?

I'm trying traverse a DOM tree, replacing and removing nodes using AngleSharp an HTML parser. This problem is not unique to this library, but rather a general question about how to recursively alter a tree and ensure that I'm still traversing the entire tree.

Take this list, myCollection , where each entry is a node object, potentially with children. It's also a live collection:

-A
-B
-C
 --D
 --E
 --F
-G

I begin to loop in a recursive function:

private void LoopRecursively(Node element) {
   //either do nothing, remove, or replace with children
   //e.g. element.Replace(element.ChildNodes);
   for (var x = 0; x < element.ChildNodes.Length; x++) {
      LoopRecursively(element.ChildNodes[x]);

   }
}

Let's say that we decide to replace the C node with it's children, so the list becomes:

-A
-B
-D
-E
-F
-G

The problem with this is that the recursion will be wrong. There are now more nodes than the Length in the for-loop accounted for, so not all items will be recursed. Similarly, removing a node would mean that the node that moved up in the list gets skipped over.

How can I recurse a tree that is potentially changing as a result of my recursive processing? Is recursing my list over and over until I'm sure that no changes have been made the only way, or am I approaching the problem incorrectly?

Safe way: Use the recursive function to create a brand new tree instead of changing the old one, then replace the old one with the new one.

Less safe way: Have your LoopRecursively function return an integer representing the number of nodes added or removed, then update the loop variables with this new number. (update both the loop index and the variable in the loop conditional)

There are now more nodes than the Length in the for-loop accounted for, so not all items will be recursed.

I don't think this is true. You are not evaluating element.ChildNodes.Length once, but in every iteration. Hence if the list is live, the length will change with your changes.

Let's assume the following simple implementation for your tree:

class Node
{
    readonly List<Node> children;
    readonly String name;

    public Node(String name)
    {
        this.children = new List<Node>();
        this.name = name;
    }

    public Node AddChild(Node node)
    {
        children.Add(node);
        return this;
    }

    public Node InsertChild(int index, Node node)
    {
        children.Insert(index, node);
        return this;
    }

    public Int32 Length
    {
        get { return children.Count; }
    }

    public Node this[Int32 index]
    {
        get { return children[index]; }
    }

    public Int32 IndexOf(Node node)
    {
        return children.IndexOf(node);
    }

    public Node RemoveChild(Node node)
    {
        children.Remove(node);
        return this;
    }

    public IEnumerable<Node> Children
    {
        get { return children.AsEnumerable(); }
    }

    public override String ToString()
    {
        var content = new String[1 + children.Count];
        content[0] = name;

        for (int i = 0; i < children.Count; )
        {
            var childs = children[i].ToString().Split(new [] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
            content[++i] = "+ " + String.Join(Environment.NewLine + "  ", childs);
        }

        return String.Join(Environment.NewLine, content);
    }
}

The given Node contains children (but no parent) and simple methods to add, remove, insert, ..., children.

Let's see how we could construct a good example with this kind of Node :

var root = new Node("Root");
root.AddChild(new Node("a")).
     AddChild(new Node("b")).
     AddChild(new Node("c").
        AddChild(new Node("d").
            AddChild(new Node("e")).
            AddChild(new Node("f"))).
        AddChild(new Node("g")).
        AddChild(new Node("h"))).
    AddChild(new Node("i"));

The output of calling root.ToString() will looks as follows.

Root
+ a
+ b
+ c
  + d
    + e
    + f
  + g
  + h
+ i

I assume you want to flatten the tree? As already said doing it in an immutable fashion might be a good idea. There are multiple ways to do it, but given the API above we could end up with the following solution:

void Flatten(Node element, List<Node> nodes)
{
    var before = nodes.Count;

    foreach (var node in element.Children)
    {
        Flatten(node, nodes);
    }

    if (nodes.Count == before)
    {
        nodes.Add(element); 
    }
}

Why do I pass in a List<Node> ? Well we could create a list in every call, which would then be merged with the list of the caller, however, the version above is a little bit more efficient. Also we are using the Count property to determine if any children have been seen. We could also use the Any() extension method, but this is again some unnecessary overhead. We pretty much just check if the given node is a leaf. if so then we add it to the provided list.

If you really want to mutate the original tree then you have also some other option. The following code takes an element, walks recursively through its children. Leafs stay untouched, children with a parent will append their descendants to the parent.

void Flatten(Node element, Node parent = null)
{
    for (var i = 0; i < element.Length; i++)
    {
        Flatten(element[i], element);
    }

    if (parent != null && element.Length > 0)
    {
        var children = element.Children.ToArray();
        var index = parent.IndexOf(element);
        parent.RemoveChild(element);

        foreach (var child in children)
        {
            element.RemoveChild(child);
            parent.InsertChild(index++, child);
        }
    }
}

The first iteration will not change the value of element.Length . Therefore we could also safely evaluate it once and that's it. However, the potential second iteration will do that. This is why we get a copy of element.Children.ToArray() first. There is also another way without that copy, which involves a reversed for-loop (going from Length to -1).

Let's see how the serialization of the tree after calling Flatten(root) will look like.

Root
+ a
+ b
+ e
+ f
+ g
+ h
+ i

Hope this answer helps you a bit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM