简体   繁体   English

有向无环图中所有节点的可达性计数

[英]Reachability Count for all nodes in a Directed Acyclic Graph

So there was this challenge on a programming contest on Hackerrank called "Acyclic Graph", which basically boils down to counting the number of nodes reachable from every node in a "Directed Acyclic Graph".所以在 Hackerrank 上的一个名为“Acyclic Graph”的编程竞赛中出现了这个挑战,它基本上归结为计算“Directed Acyclic Graph”中每个节点可到达的节点数。 For example, say you have a graph like so:例如,假设您有一个像这样的图表:

[ 1 ] ---->[ 2 ]--->[ 4 ]--->[ 5 ]
[ 3 ] ------/

Reachability count (including origin node):可达性计数(包括源节点):

Node 1: 4
Node 2: 3
Node 3: 4
Node 4: 2
Node 5: 1

My approach was a "Depth First" traversal with memoization.我的方法是带有记忆的“深度优先”遍历。 Looked around quite a bit, but it seems as though the run time can't be improved much further because of the over counting that occurs in cases like so:环顾四周,但似乎运行时间无法进一步改善,因为在以下情况下会发生过度计数:

[ 1 ] ---->[ 2 ]--->[ 4 ]--->[ 5 ]
[ 3 ] ------/--------/

The third node would count the fourth node, even though the second node already counted the fourth node.第三个节点将计算第四个节点,即使第二个节点已经计算了第四个节点。 To make things a bit worse, I only solve these challenges in JavaScript.更糟糕的是,我只在 JavaScript 中解决了这些挑战。 Its my primary language and I get a thrill from pushing its boundaries.它是我的主要语言,我从突破它的界限中获得了快感。 No one on the leader board has solved it in JavaScript yet, but I presume it's possible.排行榜上还没有人用 JavaScript 解决这个问题,但我认为这是可能的。 After the contest, I managed to pass 13 out of 24 test cases with the following code:比赛结束后,我使用以下代码成功通过了 24 个测试用例中的 13 个:

function Solution( graph, nodes ) {

    var memory = new Array( nodes + 1 )
      , result = 0;

    graph.forEach( ( a, v ) => DepthFirstSearch( graph, v, memory ) );

    // challenge asks for an output variation, but the accurate 
    // reachability count of every node will be contained in "d.length".
    memory.forEach( ( d, i ) => { if ( i && ( 2 * d.length ) >= nodes ) result++; } );

    return result;
}

function DepthFirstSearch( graph, v, memory ) {

    if ( memory[ v ] ) return memory[ v ];

    var descendants = new Uint16Array( [ v ] );

    graph[ v ].forEach( u => {

        descendants = MergeTypedArrays( 
            DepthFirstSearch( graph, u, memory ),  
            descendants
        );
    } );
                                           // make elements unique
                                           // to avoid over counting
    return memory[ v ] = Uint16Array.from( new Set( descendants ) ); 
}

function MergeTypedArrays(a, b) {

    var c = new a.constructor( a.length + b.length );

    c.set( a );
    c.set( b, a.length );

    return c;
}

// adjacency list
var graph = [ 
    [],    // 0
    [ 2 ], // 1
    [ 4 ], // 2
    [ 2 ], // 3
    [ 5 ], // 4
    []     // 5
];

var nodes = 5;

Solution( graph, nodes );

It fails for all inputs greater than 50kb, presumably inputs with a large set of nodes and edges (ie 50,000 nodes and 40,000 edges).对于大于 50kb 的所有输入,可能是具有大量节点和边(即 50,000 个节点和 40,000 个边)的输入,它都失败了。 Failing to identify or conceive a faster, more memory efficient algorithm, I'm at a total loss as to what to try next.未能识别或构思出更快、更高效的算法,我完全不知道接下来要尝试什么。 Thought about making the DFS iterative, but I'm thinking the memory consumption of memoizing thousands of arrays will dwarf that, which seems to be the main problem.考虑过使 DFS 迭代,但我认为记忆数千个数组的内存消耗将使这一点相形见绌,这似乎是主要问题。 I get "Abort Called" and "Runtime Error" on Hackerrank for the 11 tests that fail (as oppose to "Timeout").对于失败的 11 个测试(与“超时”相反),我在 Hackerrank 上收到“Abort Called”和“Runtime Error”。 Also tried "bitSets" with "union", but the memory consumption turned out to be worse since the bitSets arrays need to be large enough to store numbers up to 50,000.还尝试了“bitSets”和“union”,但内存消耗变得更糟,因为 bitSets 数组需要足够大以存储多达 50,000 的数字。

Constraints:约束:

1 ≤ n,m ≤ 5×10^4
1 ≤ a(i),b(i) ≤ n and a(i) ≠ b(i)
It is guaranteed that graph G does not contain cycles.

Just want to make it clear that I won't get any points for passing all tests since this challenge is locked, this is for educational purposes, mainly on optimization.只是想说明一下,由于此挑战已锁定,因此我不会因为通过所有测试而获得任何分数,这是出于教育目的,主要用于优化。 I'm aware of related SO posts that point to topological sort, but as far as I understand, topological sort will still over count on cases like the one described above, thus not a viable solution.我知道相关的 SO 帖子指向拓扑排序,但据我所知,拓扑排序仍然会过度依赖上述情况,因此不是一个可行的解决方案。 If I misunderstood, please enlighten me.如果我理解错了,请赐教。 Thank you in advance for your time.提前感谢您的时间。

Question: How can I optimize this further?问题:如何进一步优化? Is there a more efficient approach?有没有更有效的方法?

Depth-First Search (DFS) is one good way of solving this problem.深度优先搜索 (DFS) 是解决此问题的一种好方法。 Another way would be Breadth-First Search (BFS) which can also run in parallel and can be optimized very well - but all at the cost of much higher code complexity.另一种方法是广度优先搜索 (BFS),它也可以并行运行并且可以很好地优化 - 但所有这些都以更高的代码复杂性为代价。 So my recommendation would be to stick to DFS.所以我的建议是坚持 DFS。

First I have to apologize, but my JavaScript skills are not very good (ie they are non existent) so my solutions below are using Java but the ideas should be easy to port.首先我必须道歉,但我的 JavaScript 技能不是很好(即它们不存在)所以我下面的解决方案使用 Java,但这些想法应该很容易移植。

Your initial question is missing one very important detail: We only need to find all nodes where the number of reachable nodes is larger or equal than |V| / 2您最初的问题缺少一个非常重要的细节:我们只需要找到所有可达节点数大于或等于|V| / 2节点|V| / 2 |V| / 2

Why does that matter?为什么这很重要? Computing the number of reachable nodes for each node is expensive as we have to do a DFS or BFS starting from every node in the graph.计算每个节点的可达节点数是昂贵的,因为我们必须从图中的每个节点开始执行 DFS 或 BFS。 But if we only need to find nodes with the above property, that is much easier.但是如果我们只需要找到具有上述属性的节点,那就容易多了。

Let successors(n) be all nodes reachable from n and ancestor(n) be all nodes that can reach n .后继者(n)是从n可到达的所有节点,祖先(n)是可到达n 的所有节点。 We can use the following observations to drastically reduce the search space:我们可以使用以下观察来大幅减少搜索空间:

  • if the number of nodes reachable from n is smaller than |V|如果从n可达的节点数小于|V| / 2 then no node in successors(n) can have a larger number / 2那么后继节点(n)中没有节点可以有更大的数字
  • if the number of nodes reachable from n is greater or equal than |V|如果从n可达的节点数大于或等于|V| / 2 then all nodes in ancestors(n) will have a larger number / 2那么祖先(n)中的所有节点都会有更大的数字

How can we use that?我们如何使用它?

  1. When creating your graph, also create the transposed graph.创建图形时,还要创建转置图形。 That means when storing an edge a->b, you store b->a in the transposed graph.这意味着在存储边 a->b 时,您将 b->a 存储在转置图中。
  2. Create an array that stores which nodes to ignore, initialize it with false创建一个存储要忽略的节点的数组,用false对其进行初始化
  3. Implement a DFS based function that determines whether a given node has a number of reachable nodes >= |V| / 2实现基于 DFS 的函数,该函数确定给定节点是否具有多个可达节点>= |V| / 2 >= |V| / 2 (see below) >= |V| / 2 (见下文)
  4. In that function, ignore nodes that are marked as ignored在该函数中,忽略标记为忽略的节点
  5. If for node n the number of nodes is smaller than |V|如果节点n的节点数小于|V| / 2 , mark all nodes in successors(n) as ignored / 2 ,将successors(n)中的所有节点标记为忽略
  6. Else count all nodes in ancestors(n) and mark them as ignored否则计算祖先(n)中的所有节点并将它们标记为忽略

Solution using Iterative DFS使用迭代 DFS 的解决方案

public int countReachable(int root, boolean[] visited, boolean[] ignored, Graph graph) {
    if (ignored[root]) {
        return 0;
    }

    Stack<Integer> stack = new Stack<>();
    stack.push(root);

    int count = 0;
    while (stack.empty() == false) {
        int node = stack.pop();
        if (visited[node] == false) {
            count++;
            visited[node] = true;
            for (int neighbor : graph.getNeighbors(node)) {
                if (visited[neighbor] == false) {
                    stack.push(neighbor);
                }
            }
        }
    }
    if (count * 2 >= graph.numNodes()) {
        return markAndCountAncestors(root, visited, ignored, graph);
    } else {
        return markSuccessors(root, visited, ignored, graph);
    }
}

Function to mark the Ancestors标记祖先的功能

This is just another DFS but using the transposed graph.这只是另一个 DFS,但使用了转置图。 Note that we can reuse the visited array as all values that we will use are false since this is an acyclic graph.请注意,我们可以重用visited数组,因为我们将使用的所有值都是false因为这是一个无环图。

public int markAndCountAncestors(int root, boolean[] visited, boolean[] ignored, Graph graph) {   
    Stack<Integer> stack = new Stack<>();
    stack.push(root);
    visited[root] = false;

    int count = 0;
    while (stack.empty() == false) {
        int node = stack.pop();
        if (visited[node] == false && ignored[node] == false) {
            count++;
            visited[node] = true;
            ignored[node] = true;
            for (int neighbor : graph.transposed.getNeighbors(node)) {
                if (visited[neighbor] == false && ignored[node] == false) {
                    stack.push(neighbor);
                }
            }
        }
    }
    return count;
}

Function to mark the Successors标记继任者的函数

Note that we already have the successors since they are just the nodes where we set visited to true.请注意,我们已经有了后继节点,因为它们只是我们将visited设置为true 的节点。

public int markSuccessors(int root, boolean[] visited, boolean[] ignored, Graph graph) {
    for(int node = 0; node < graph.numNodes(); node++) {
        if (visited[node)) {
            ignored[node] = true;
        }
    }
    return 0;
}

Function to compute the result计算结果的函数

public void solve(Graph graph) {
    int count = 0;
    boolean[] visited = new boolean[graph.numNodes()];
    boolean[] ignored = new boolean[graph.numNodes()];
    for (int node = 0; node < graph.numNodes(); node++) {
        Arrays.fill(visited, false); // reset visited array
        count += countReachable(node, visited, ignored, graph);
    }
    System.out.println("Result: " + count);
}

On the large test-case you posted, this runs in 7.5 seconds for me.在您发布的大型测试用例中,这对我来说运行时间为 7.5 秒。 If you invert the iteration order (ie in solve you start with the largest node id) it goes down to 4 seconds, but that somewhat feels like cheating ^^如果你反转迭代顺序(即在solve你从最大的节点 id 开始)它会下降到 4 秒,但这有点像作弊 ^^

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM