I'm trying to find the longest path in a Directed Acyclic graph. At the moment, my code seems to be running time complexity of O(n 3 ) .
The graph is of input {0: [1,2], 1: [2,3], 3: [4,5] }
#Input: dictionary: graph, int: start, list: path
#Output: List: the longest path in the graph (Recurrance)
# This is a modification of a depth first search
def find_longest_path(graph, start, path=[]):
path = path + [start]
paths = path
for node in graph[start]:
if node not in path:
newpaths = find_longest_path(graph, node, path)
#Only take the new path if its length is greater than the current path
if(len(newpaths) > len(paths)):
paths = newpaths
return paths
It returns a list of nodes in the form eg [0,1,3,5]
How can I make this more efficient than O(n 3 ) ? Is recursion the right way to solve this or should I be using a different loop?
You can solve this problem in O(n+e) (ie linear in the number of nodes + edges).
The idea is that you first create a topological sort (I'm a fan of Tarjan's algorithm ) and the set of reverse edges. It always helps if you can decompose your problem to leverage existing solutions.
You then walk the topological sort backwards pushing to each parent node its child's distance + 1 (keeping maximums in case there are multiple paths). Keep track of the node with the largest distance seen so far.
When you have finished annotating all of the nodes with distances you can just start at the node with the largest distance which will be your longest path root, and then walk down your graph choosing the children that are exactly one count less than the current node (since they lie on the critical path).
In general, when trying to find an optimal complexity algorithm don't be afraid to run multiple stages one after the other. Five O(n) algorithms run sequentially is still O(n) and is still better than O(n 2 ) from a complexity perspective (although it may be worse real running time depending on the constant costs/factors and the size of n ).
ETA: I just noticed you have a start node. This makes it simply a case of doing a depth first search and keeping the longest solution seen so far which is just O(n+e) anyway. Recursion is fine or you can keep a list/stack of visited nodes (you have to be careful when finding the next child each time you backtrack).
As you backtrack from your depth first search you need to store the longest path from that node to a leaf so that you don't re-process any sub-trees. This will also serve as a visited
flag (ie in addition to doing the node not in path
test also have a node not in subpath_cache
test before recursing). Instead of storing the subpath you could store the length and then rebuild the path once you're finished based on sequential values as discussed above (critical path).
ETA2: Here's a solution.
def find_longest_path_rec(graph, parent, cache):
maxlen = 0
for node in graph[parent]:
if node in cache:
pass
elif node not in graph:
cache[node] = 1
else:
cache[node] = find_longest_path_rec(graph, node, cache)
maxlen = max(maxlen, cache[node])
return maxlen + 1
def find_longest_path(graph, start):
cache = {}
maxlen = find_longest_path_rec(graph, start, cache)
path = [start]
for i in range(maxlen-1, 0, -1):
for node in graph[path[-1]]:
if cache[node] == i:
path.append(node)
break
else:
assert(0)
return path
Note that I've removed the node not in path
test because I'm assuming that you're actually supplying a DAG as claimed. If you want that check you should really be raising an error rather than ignoring it. Also note that I've added the assertion to the else
clause of the for
to document that we must always find a valid next (sequential) node in the path.
ETA3: The final for
loop is a little confusing. What we're doing is considering that in the critical path all of the node distances must be sequential. Consider node 0 is distance 4, node 1 is distance 3 and node 2 is distance 1. If our path started [0, 2, ...]
we have a contradiction because node 0 is not 1 further from a leaf than 2.
There are a couple of non-algorithmic improvements I'd suggest (these are related to Python code quality):
def find_longest_path_from(graph, start, path=None):
"""
Returns the longest path in the graph from a given start node
"""
if path is None:
path = []
path = path + [start]
max_path = path
nodes = graph.get(start, [])
for node in nodes:
if node not in path:
candidate_path = find_longest_path_from(graph, node, path)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
def find_longest_path(graph):
"""
Returns the longest path in a graph
"""
max_path = []
for node in graph:
candidate_path = find_longest_path_from(graph, node)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
Changes explained:
def find_longest_path_from(graph, start, path=None):
if path is None:
path = []
I've renamed find_longest_path
as find_longest_path_from
to better explain what it does.
Changed the path
argument to have a default argument value of None
instead of []
. Unless you know you will specifically benefit from them, you want to avoid using mutable objects as default arguments in Python. This means you should typically set path
to None
by default and then when the function is invoked, check whether path is None
and create an empty list accordingly.
max_path = path
...
candidate_path = find_longest_path_from(graph, node, path)
...
I've updated the names of your variables from paths
to max_path
and newpaths
to candidate_path
. These were confusing variable names because they referred to the plural of path -- implying that the value they stored consisted of multiple paths -- when in fact they each just held a single path. I tried to give them more descriptive names.
nodes = graph.get(start, [])
for node in nodes:
Your code errors out on your example input because the leaf nodes of the graph are not keys in the dict
so graph[start]
would raise a KeyError
when start
is 2
, for instance. This handles the case where start
is not a key in graph
by returning an empty list.
def find_longest_path(graph):
"""
Returns the longest path in a graph
"""
max_path = []
for node in graph:
candidate_path = find_longest_path_from(graph, node)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
A method to find the longest path in a graph that iterates over the keys. This is entirely separate from your algorithmic analysis of find_longest_path_from
but I wanted to include it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.