简体   繁体   English

将有向无环图(DAG)转换为树

[英]Converting Directed Acyclic Graph (DAG) to tree

I'm trying to implement algoritm to convert Directed Acyclic Graph to Tree (for fun, learining, kata, name it). 我正在尝试实现algoritm将Directed Acyclic Graph转换为Tree(为了好玩,学习,kata,命名它)。 So I come up with the data structure Node: 所以我想出了数据结构Node:

DAG到树

/// <summary>
/// Represeting a node in DAG or Tree
/// </summary>
/// <typeparam name="T">Value of the node</typeparam>
public class Node<T> 
{
    /// <summary>
    /// creats a node with no child nodes
    /// </summary>
    /// <param name="value">Value of the node</param>
    public Node(T value)
    {
        Value = value;
        ChildNodes = new List<Node<T>>();
    }

    /// <summary>
    /// Creates a node with given value and copy the collection of child nodes
    /// </summary>
    /// <param name="value">value of the node</param>
    /// <param name="childNodes">collection of child nodes</param>
    public Node(T value, IEnumerable<Node<T>> childNodes)
    {
        if (childNodes == null)
        {
            throw new ArgumentNullException("childNodes");
        }
        ChildNodes = new List<Node<T>>(childNodes);
        Value = value;
    }

    /// <summary>
    /// Determines if the node has any child node
    /// </summary>
    /// <returns>true if has any</returns>
    public bool HasChildNodes
    {
        get { return this.ChildNodes.Count != 0; }
    }


    /// <summary>
    /// Travearse the Graph recursively
    /// </summary>
    /// <param name="root">root node</param>
    /// <param name="visitor">visitor for each node</param>
    public void Traverse(Node<T> root, Action<Node<T>> visitor)
    {
        if (root == null)
        {
            throw new ArgumentNullException("root");
        }
        if (visitor == null)
        {
            throw new ArgumentNullException("visitor");
        }

        visitor(root); 
        foreach (var node in root.ChildNodes)
        {
            Traverse(node, visitor);
        }
    }

    /// <summary>
    /// Value of the node
    /// </summary>
    public T Value { get; private set; }

    /// <summary>
    /// List of all child nodes
    /// </summary>
    public List<Node<T>> ChildNodes { get; private set; }
}

It's pretty straightforward. 这很简单。 Methods: 方法:

/// <summary>
/// Helper class for Node 
/// </summary>
/// <typeparam name="T">Value of a node</typeparam>
public static class NodeHelper
{
    /// <summary>
    /// Converts Directed Acyclic Graph to Tree data structure using recursion.
    /// </summary>
    /// <param name="root">root of DAG</param>
    /// <param name="seenNodes">keep track of child elements to find multiple connections (f.e. A connects with B and C and B also connects with C)</param>
    /// <returns>root node of the tree</returns>
    public static Node<T> DAG2TreeRec<T>(this Node<T> root, HashSet<Node<T>> seenNodes)
    {
        if (root == null)
        {
            throw new ArgumentNullException("root");
        }
        if (seenNodes == null)
        {
            throw new ArgumentNullException("seenNodes");
        }

        var length = root.ChildNodes.Count;
        for (int i = 0; i < length; ++i)
        {
            var node = root.ChildNodes[i];
            if (seenNodes.Contains(node))
            {
                var nodeClone = new Node<T>(node.Value, node.ChildNodes);
                node = nodeClone;
            }
            else
            {
                seenNodes.Add(node);
            }
            DAG2TreeRec(node, seenNodes);
        }
        return root;
    }
    /// <summary>
    /// Converts Directed Acyclic Graph to Tree data structure using explicite stack.
    /// </summary>
    /// <param name="root">root of DAG</param>
    /// <param name="seenNodes">keep track of child elements to find multiple connections (f.e. A connects with B and C and B also connects with C)</param>
    /// <returns>root node of the tree</returns>
    public static Node<T> DAG2Tree<T>(this Node<T> root, HashSet<Node<T>> seenNodes)
    {
        if (root == null)
        {
            throw new ArgumentNullException("root");
        }
        if (seenNodes == null)
        {
            throw new ArgumentNullException("seenNodes");
        }

        var stack = new Stack<Node<T>>();
        stack.Push(root);

        while (stack.Count > 0) 
        {
            var tempNode = stack.Pop();
            var length = tempNode.ChildNodes.Count;
            for (int i = 0; i < length; ++i)
            {
                var node = tempNode.ChildNodes[i];
                if (seenNodes.Contains(node))
                {
                    var nodeClone = new Node<T>(node.Value, node.ChildNodes);
                    node = nodeClone;
                }
                else
                {
                    seenNodes.Add(node);
                }
               stack.Push(node);
            }
        } 
        return root;
    }
}

and test: 并测试:

    static void Main(string[] args)
    {
        // Jitter preheat
        Dag2TreeTest();
        Dag2TreeRecTest();

        Console.WriteLine("Running time ");
        Dag2TreeTest();
        Dag2TreeRecTest();

        Console.ReadKey();
    }

    public static void Dag2TreeTest()
    {
        HashSet<Node<int>> hashSet = new HashSet<Node<int>>();

        Node<int> root = BulidDummyDAG();

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        var treeNode = root.DAG2Tree<int>(hashSet);
        stopwatch.Stop();

        Console.WriteLine(string.Format("Dag 2 Tree = {0}ms",stopwatch.ElapsedMilliseconds));

    }

    private static Node<int> BulidDummyDAG()
    {
        Node<int> node2 = new Node<int>(2);
        Node<int> node4 = new Node<int>(4);
        Node<int> node3 = new Node<int>(3);
        Node<int> node5 = new Node<int>(5);
        Node<int> node6 = new Node<int>(6);
        Node<int> node7 = new Node<int>(7);
        Node<int> node8 = new Node<int>(8);
        Node<int> node9 = new Node<int>(9);
        Node<int> node10 = new Node<int>(10);
        Node<int> root  = new Node<int>(1);

        //making DAG                   
        root.ChildNodes.Add(node2);    
        root.ChildNodes.Add(node3);    
        node3.ChildNodes.Add(node2);   
        node3.ChildNodes.Add(node4);   
        root.ChildNodes.Add(node5);    
        node4.ChildNodes.Add(node6);   
        node4.ChildNodes.Add(node7);
        node5.ChildNodes.Add(node8);
        node2.ChildNodes.Add(node9);
        node9.ChildNodes.Add(node8);
        node9.ChildNodes.Add(node10);

        var length = 10000;
        Node<int> tempRoot = node10; 
        for (int i = 0; i < length; i++)
        {
            var nextChildNode = new Node<int>(11 + i);
            tempRoot.ChildNodes.Add(nextChildNode);
            tempRoot = nextChildNode;
        }

        return root;
    }

    public static void Dag2TreeRecTest()
    {
        HashSet<Node<int>> hashSet = new HashSet<Node<int>>();

        Node<int> root = BulidDummyDAG();

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        var treeNode = root.DAG2TreeRec<int>(hashSet);
        stopwatch.Stop();

        Console.WriteLine(string.Format("Dag 2 Tree Rec = {0}ms",stopwatch.ElapsedMilliseconds));
    }

What is more, data structure need some improvment: 更重要的是,数据结构需要一些改进:

  • Overriding GetHash, toString, Equals, == operator 覆盖GetHash,toString,Equals,== operator
  • implementing IComparable 实现IComparable
  • LinkedList is probably a better choice LinkedList可能是更好的选择

Also, before the conversion there are certian thigs that need to be checked: 此外,在转换之前,需要检查certian thigs:

  • Multigraphs 多重图
  • If it's DAG (Cycles) 如果是DAG(周期)
  • Diamnods in DAG DAG的Diamnods
  • Multiple roots in DAG DAG中有多个根

All in all, it narrows down to a few questions: How can I improve the conversion? 总而言之,它缩小为几个问题: 如何改善转换? Since this is a recurion it's possible to blow up the stack. 由于这是一次复发,因此可能会炸毁堆栈。 I can add stack to memorize it. 我可以添加堆栈来记住它。 If I do continuation-passing style, will I be more efficient? 如果我继续传递风格,我会更有效率吗?

I feel that immutable data structure in this case would be better. 我觉得在这种情况下不可变的数据结构会更好。 Is it correct? 这是对的吗?

Is Childs the right name ? Childs是正确的名字吗? :) :)

Algorithm: 算法:

  • As you observed, some nodes appear twice in the output. 如您所见,某些节点在输出中出现两次。 If the node 2 had children, the whole subtree would appear twice. 如果节点2有子节点,则整个子树将出现两次。 If you want each node to appear just once, replace 如果您希望每个节点只出现一次,请替换

     if (hashSet.Contains(node)) { var nodeClone = new Node<T>(node.Value, node.Childs); node = nodeClone; } 

    with

     if (hashSet.Contains(node)) { // node already seen -> do nothing } 
  • I wouldn't be too worried about the size of the stack or performance of recursion. 我不会太担心堆栈的大小或递归的性能。 However, you could replace your Depth-first-search with Breadth-first-search which would result in nodes closer to the root being visited earlier, thus yielding a more "natural" tree (in your picture you already numbered the nodes in BFS order). 但是,您可以使用广度优先搜索替换深度优先搜索 ,这将导致节点更接近先前访问的根,从而产生更“自然”的树(在您的图片中,您已经按照BFS顺序编号了节点) )。

      var seenNodes = new HashSet<Node>(); var q = new Queue<Node>(); q.Enqueue(root); seenNodes.Add(root); while (q.Count > 0) { var node = q.Dequeue(); foreach (var child in node.Childs) { if (!seenNodes.Contains(child )) { seenNodes.Add(child); q.Enqueue(child); } } 

    The algorithm handles diamonds and cycles. 该算法处理钻石和周期。

  • Multiple roots 多根

    Just declare a class Graph which will contain all the vertices 只需声明一个包含所有顶点的类Graph

     class Graph { public List<Node> Nodes { get; private set; } public Graph() { Nodes = new List<Node>(); } } 

Code: 码:

  • the hashSet could be named seenNodes . hashSet可以命名为seenNodes

  • Instead of 代替

     var length = root.Childs.Count; for (int i = 0; i < length; ++i) { var node = root.Childs[i]; 

    write

     foreach (var child in root.Childs) 
  • In Traverse, the visitor is quite unnecessary. 在Traverse,访客是非常不必要的。 You could rather have a method which yields all the nodes of the tree (in the same order traverse does) and it is up to user to do whatever with the nodes: 你可能宁愿有一个方法可以产生树的所有节点(以与遍历相同的顺序),并且用户可以对节点做任何事情:

     foreach(var node in root.TraverseRecursive()) { Console.WriteLine(node.Value); } 
  • If you override GetHashCode and Equals, the algorithm will no more be able to distinguish between two different Nodes with same value, which is probably not what you want. 如果重写GetHashCode和Equals,该算法将无法再区分具有相同值的两个不同节点,这可能不是您想要的。

  • I don't see any reason why LinkedList would be better here than List, except for the reallocations (Capacity 2,4,8,16,...) which List does when adding nodes. 我没有看到为什么LinkedList比List更好的原因,除了List在添加节点时所做的重新分配(容量2,4,8,16,...)。

  1. you had better posted in CodeReview 你最好在CodeReview中发布
  2. Childs is wrong => Children 孩子是错的=>孩子
  3. you don't have to use a HashSet, you could have easily used a List>, because checking references only is enough here. 你不必使用HashSet,你可以很容易地使用List>,因为这里只检查引用就足够了。 (and so no GetHashCode, Equals and operators overriding is needed) (因此不需要GetHashCode,Equals和运算符覆盖)

  4. easeier way is Serializing your class and then Deserializing it again into second objectwith XmlSerializer. 更简单的方法是序列化您的类,然后使用XmlSerializer将其再次反序列化为第二个对象。 while Serialized and Deserialized, 1 object referenced 2 times will become 2 objects with different references. 序列化和反序列化时,引用2次的1个对象将成为具有不同引用的2个对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM