given a number k and a graph is there a DFS run that will give forest larger then k

Question

I was given a question that I can't seem to solve.

given a directed graph G=(V,E) and a natural number k, k>0.

Find an algorithm that will return "YES" if there is a DFS run that the number of trees in the DFS forest is >= K.

the algorithm should be linear in the size of the graph G. O(|V| + |E|).

I Will Explain:

so for the following run where s is the starting node, we will get 2 trees:

but if we stared at the node u we would get only 1 tree. so I need to return yes for k = 1 or 2. and no else.

so how can I know the number of trees?

thanks for the help!

Answer 1

After the edit on the question:

Use Kosaraju's algorithm to get strongly connected components in O(V + E) time. That would give the max K.

Here max K is same as the number of strongly connected components.

Why?

Let us do the proof by contradiction now. Let us say we have 4 strongly connected components for the graph shown in the question. Assume there is a possibility of getting an extra dfs tree starting at some node v . That would mean either the node v was not covered while counting the number of strongly connected components or the node was missed during DFS. But either of the case is not possible if we do a DFS or find the strongly connected components using well proven algorithm. Hence, our assumption is false. Thus, the proof by contradiction.

Answer before edit on the question:

DFS(Vertex v):
    mark v as visited;
    for(Vertex neighbor: Neighbors of v){
        if(!isVisited(neighbor)){
            DFS(neighbor);
        })
    }

count_trees(Graph G): //V vertices and E edges in total
    for(Vertex v: V){
        if(!isVisited(v)){
            DFS(v);
            trees++;
        })
    }
    return trees;

Above steps are self explanatory. Maintaining if a vertex has been visited is trivial.

The above approach is just DFS on every node that hasn't been visited before. Hence, the time complexity is the same as that of DFS which is O(|V| + |E|) .

Answer 2

Imagine that the given graph is actually a tree. Then if you would start a DFS in the root of the tree, you would find the whole graph in one DFS search. On the other extreme, if you would start a DFS in a leaf, and then in the next leaf, and would start every new DFS in the nodes as you would get them bottom-up, then each DFS will only find one node and then quit (because the children would already have been visited by a previous DFS). So then you can launch as many DFS searches as there are nodes in the tree.

The same remains true if the graph has some extra edges, but remains acyclic.

It becomes different when the graph has cycles. In that case a DFS that starts in any member of a cycle will find all other members in that cycle. So a cycle can never get split over different DFS searches. This cycle, combined with every other cycle that intersects with it, is a so called Strongly connected component .

An algorithm would thus have to find strongly connected components and count those as 1 DFS, while all other nodes which do no partake in any cycle can each be counted as a separate DFS (since you would start in the leaves of those subtrees).

Here is an algorithm that uses DFS (which is confusing, since it is a DFS that is counting possible DFSs) to identify cycles and updates the count accordingly. I've used recursion for this algorithm, and so there must be some fast backtracking when the required k has been reached: further searching is not necessary in that case.

All edges are visited only once, and the main loop also visits all nodes exactly once, so the required time complexity is attained.

def k_forests(adj, k):
    # pathindex[node] == 0: means node has not been visited
    # pathindex[node] == -1: means node has been visited and all neighbors processed
    # pathindex[node] > 0: means node is at this step in the current DFS path
    pathindex = [0] * len(adj) # none of the nodes has been visited

    def recur(node, depth):
        nonlocal k  # we will decrease this count

        if pathindex[node] > 0: # cycle back
            return pathindex[node]
        if pathindex[node] == -1: # already processed
            return depth
        pathindex[node] = depth # being visited
        cycle = depth + 1 # infinity
        for neighbor in adj[node]:
            cycle = min(cycle, recur(neighbor, depth + 1))
            if k == 0: # success
                return -1 # backtrack completely...
        if cycle >= depth: # no cylce detected or back out of cycle
            k -= 1
            if k == 0:
                return -1 # success
        pathindex[node] = -1 # completely visited and processed
        return cycle

    # main loop over the nodes
    for node in range(len(adj)):
        recur(node, 1)
        if k == 0:
            return "YES"
    return "NO"

This function should be called with an adjacency list for every node, where nodes are identified by a sequential number, starting from 0. For example, the graph in the question can be represented as follows, where s=0, t=1, u=2, v=3, w=4, x=5, y=6, and z=7:

adj = [
    [4, 7],
    [2, 3],
    [3, 1],
    [0, 4],
    [5],
    [7],
    [5],
    [6, 4]
]

print(k_forests(adj, 4)) #  YES
print(k_forests(adj, 5)) #  NO

given a number k and a graph is there a DFS run that will give forest larger then k

Question

2 answers

solution1
1 2020-06-28 06:21:55

solution2
1 ACCPTED 2020-06-28 10:30:25

given a number k and a graph is there a DFS run that will give forest larger then k

Question

2 answers

solution1 1 2020-06-28 06:21:55

solution2 1 ACCPTED 2020-06-28 10:30:25

solution1
1 2020-06-28 06:21:55

solution2
1 ACCPTED 2020-06-28 10:30:25