简体   繁体   中英

Check if there exist a path between two vertices in directed acyclic graph - queries

This question can be easily solved in O(n + m) per query, however is this possible to answer such queries in better complexity with preprocessing better than O(n²)?

In tree it can be easily done by working with pre-order and in-order. I tried something similar in DAG but it doesn't make any sense.

I also tried to change this problem into LCA in DAG problem, but finding LCA in DAG can't be solved fast enough.


To be precise with constraints let's say:

n - number of vertices, up to 10^5

m - number of edges, up to 10^5

q - number of queries, up to 10^5

Interesting question. My intuition says "no". I haven't thought it through though.

However (assuming this question is not a theoretical one), for practical purposes, you can use a Bloom filter .

One possible solution to your problem using a Bloom filter would first generate K different orders of the graph, and for each, store the mapping from a node to its index in that order. Then, to test "reachability" from N1 to N2, you check (foreach order) if index-of-N1 is less than index-of-N2 (this check is O(1)). If this holds for all of the orders, it is reachable with pretty good probability ( assuming K is big enough ). (Depending on your real world use case, it might even be OK to gerenate such false positives occasionally, or you can then run the reliable O(N+M) check). Else, it is definitely not.

I have a feeling that there might be a solution along the following lines, but this is by no means a full solution.

Let S be a subset of vertices. For each vertex V in the graph, consider the set D_S(V), which I define as follows: D_S(V) = {V} if V in S, and otherwise, D_S(V) is the union of {V} with the sets D_S(W) for all direct descendants W of V. (That is, it is "all eventual descendants of V, but stop the recursion whenever you hit an element of V".) The question is: can we find a set S such that the size of S is O(f(N)) and also D_S(V) is of size O(g(N)) for all V, and where f and g are asymptotically sublinear? (Maybe we can achieve sqrt for both, for example.)

If we can find this, then I suggest the following strategy. For preprocessing, create for each U in S a hash table containing all vertices eventually reachable from U. This can be achieved in O(f(N) * M); that's not necessarily better than O(N^2), but better than O(M*Q) at least.

Now to answer a query "is U reachable from V?", this is trivial if V in S. Otherwise, we check if V = U, in which case it's also trivial. Finally, we apply the same process to all descendants of V, recursively, returning "yes" if the answer is "yes" thru either of the two cases above, but "no" only if we never find U. This recursion takes up to O(g(N)) steps.

The question that remains is how to choose S. I think if the graph arises from some process where the out-degree follows a power law, one might just take the sqrt(N) vertices with the highest out-degrees. But for example if we have the graph on N=2*K vertices (i, 0) and (i, 1), with K^2 edges: from each (i, 0) to each (j, 1); then there is no suitable subset S. But maybe graphs that do not have a suitable S have by necessity a degree of uniformity we can exploit... Or maybe not. I don't know. Any ideas, let me know!

Any algorithm for the reachability querying on DAGs can be used for answering such query on general directed graphs by compressing the strong connected components. So I do not think the DAG condition can do anything on reducing the complexity.

As for the reachability querying on directed graph, index construction techniques mentioned in this paper might be helpful for some cases.

If you are looking at serving fast multiple path_exists(src_vertex, dest_vertex) queries with some preprocessing, consider invoking Warshall's algorithm to determine the transitive closure which tells if a path exists between any two arbitrary nodes in a directed graph, as a preprocessing step (possibly at server startup). The algorithm has a worst case time complexity of O(n^3), and space complexity of O(n^2),where n is the number of nodes. The output of this algorithm is a nxn matrix where matrix[i][j] = 1 if there's a path between node_i and node_j, and zero otherwise. So, at query serving time, you could just return the result in O(1) time by looking the (i,j) pair up in the transitive closure matrix.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM