[英]How can I make a recursive search for longest node more efficient?
I'm trying to find the longest path in a Directed Acyclic graph. 我正试图在有向非循环图中找到最长的路径。 At the moment, my code seems to be running time complexity of O(n 3 ) .
目前,我的代码似乎运行时间复杂度为O(n 3 ) 。
The graph is of input {0: [1,2], 1: [2,3], 3: [4,5] }
该图是输入
{0: [1,2], 1: [2,3], 3: [4,5] }
#Input: dictionary: graph, int: start, list: path
#Output: List: the longest path in the graph (Recurrance)
# This is a modification of a depth first search
def find_longest_path(graph, start, path=[]):
path = path + [start]
paths = path
for node in graph[start]:
if node not in path:
newpaths = find_longest_path(graph, node, path)
#Only take the new path if its length is greater than the current path
if(len(newpaths) > len(paths)):
paths = newpaths
return paths
It returns a list of nodes in the form eg [0,1,3,5]
它返回表单中的节点列表,例如
[0,1,3,5]
How can I make this more efficient than O(n 3 ) ? 如何使这比O(n 3 )更有效? Is recursion the right way to solve this or should I be using a different loop?
递归是解决这个问题的正确方法,还是应该使用不同的循环?
You can solve this problem in O(n+e) (ie linear in the number of nodes + edges). 您可以在O(n + e)中解决此问题(即节点数+线的线性)。
The idea is that you first create a topological sort (I'm a fan of Tarjan's algorithm ) and the set of reverse edges. 这个想法是你首先创建一个拓扑排序(我是Tarjan算法的粉丝)和反向边集。 It always helps if you can decompose your problem to leverage existing solutions.
如果您可以分解问题以利用现有解决方案,那将始终有所帮助。
You then walk the topological sort backwards pushing to each parent node its child's distance + 1 (keeping maximums in case there are multiple paths). 然后,您向后走拓扑排序,向子节点推送其子节点距离+ 1(如果有多条路径,则保持最大值)。 Keep track of the node with the largest distance seen so far.
跟踪到目前为止看到的最大距离的节点。
When you have finished annotating all of the nodes with distances you can just start at the node with the largest distance which will be your longest path root, and then walk down your graph choosing the children that are exactly one count less than the current node (since they lie on the critical path). 当您完成带有距离的所有节点的注释后,您可以从具有最长距离的节点开始,该节点将是您最长的路径根,然后选择与当前节点相比正好少一个的子节点向下走图表(因为他们躺在关键路径上)。
In general, when trying to find an optimal complexity algorithm don't be afraid to run multiple stages one after the other. 通常,当试图找到最优复杂度算法时,不要害怕一个接一个地运行多个阶段。 Five O(n) algorithms run sequentially is still O(n) and is still better than O(n 2 ) from a complexity perspective (although it may be worse real running time depending on the constant costs/factors and the size of n ).
顺序运行的五个O(n)算法仍然是O(n),并且从复杂性的角度来看仍然优于O(n 2 ) (尽管实际运行时间可能更差,取决于恒定的成本/因子和n的大小) 。
ETA: I just noticed you have a start node. ETA:我刚注意到你有一个开始节点。 This makes it simply a case of doing a depth first search and keeping the longest solution seen so far which is just O(n+e) anyway.
这使得它只是进行深度优先搜索并保持迄今为止看到的最长解决方案的情况,无论如何都只是O(n + e) 。 Recursion is fine or you can keep a list/stack of visited nodes (you have to be careful when finding the next child each time you backtrack).
递归很好,或者你可以保留一个列表/堆栈的访问节点(每次回溯时找到下一个孩子时都要小心)。
As you backtrack from your depth first search you need to store the longest path from that node to a leaf so that you don't re-process any sub-trees. 当您从深度优先搜索回溯时,您需要存储从该节点到叶子的最长路径,这样您就不会重新处理任何子树。 This will also serve as a
visited
flag (ie in addition to doing the node not in path
test also have a node not in subpath_cache
test before recursing). 这也将作为
visited
标志(即除了执行node not in path
测试中的node not in subpath_cache
还有一个node not in subpath_cache
之前node not in subpath_cache
测试中)。 Instead of storing the subpath you could store the length and then rebuild the path once you're finished based on sequential values as discussed above (critical path). 您可以存储长度,而不是存储子路径,然后基于上面讨论的顺序值(关键路径)重建路径。
ETA2: Here's a solution. ETA2:这是一个解决方案。
def find_longest_path_rec(graph, parent, cache):
maxlen = 0
for node in graph[parent]:
if node in cache:
pass
elif node not in graph:
cache[node] = 1
else:
cache[node] = find_longest_path_rec(graph, node, cache)
maxlen = max(maxlen, cache[node])
return maxlen + 1
def find_longest_path(graph, start):
cache = {}
maxlen = find_longest_path_rec(graph, start, cache)
path = [start]
for i in range(maxlen-1, 0, -1):
for node in graph[path[-1]]:
if cache[node] == i:
path.append(node)
break
else:
assert(0)
return path
Note that I've removed the node not in path
test because I'm assuming that you're actually supplying a DAG as claimed. 请注意,我已删除了
node not in path
测试中的node not in path
因为我假设您实际上正在提供所声明的DAG。 If you want that check you should really be raising an error rather than ignoring it. 如果你想要那个检查,你应该提出错误而不是忽略它。 Also note that I've added the assertion to the
else
clause of the for
to document that we must always find a valid next (sequential) node in the path. 另请注意,我已将断言添加到
for
的else
子句中以记录我们必须始终在路径中找到有效的下一个(顺序)节点。
ETA3: The final for
loop is a little confusing. ETA3:最后的
for
循环有点令人困惑。 What we're doing is considering that in the critical path all of the node distances must be sequential. 我们正在做的是考虑在关键路径中所有节点距离必须是连续的。 Consider node 0 is distance 4, node 1 is distance 3 and node 2 is distance 1. If our path started
[0, 2, ...]
we have a contradiction because node 0 is not 1 further from a leaf than 2. 考虑节点0是距离4,节点1是距离3而节点2是距离1.如果我们的路径开始
[0, 2, ...]
我们有一个矛盾,因为节点0离叶子不是2而不是2。
There are a couple of non-algorithmic improvements I'd suggest (these are related to Python code quality): 我建议有一些非算法改进(这些改进与Python代码质量有关):
def find_longest_path_from(graph, start, path=None):
"""
Returns the longest path in the graph from a given start node
"""
if path is None:
path = []
path = path + [start]
max_path = path
nodes = graph.get(start, [])
for node in nodes:
if node not in path:
candidate_path = find_longest_path_from(graph, node, path)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
def find_longest_path(graph):
"""
Returns the longest path in a graph
"""
max_path = []
for node in graph:
candidate_path = find_longest_path_from(graph, node)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
Changes explained: 变更说明:
def find_longest_path_from(graph, start, path=None):
if path is None:
path = []
I've renamed find_longest_path
as find_longest_path_from
to better explain what it does. 我已将
find_longest_path
重命名为find_longest_path_from
以更好地解释它的作用。
Changed the path
argument to have a default argument value of None
instead of []
. 将
path
参数更改为默认参数值为None
而不是[]
。 Unless you know you will specifically benefit from them, you want to avoid using mutable objects as default arguments in Python. 除非您知道您将从中受益,否则您希望避免将可变对象用作Python中的默认参数。 This means you should typically set
path
to None
by default and then when the function is invoked, check whether path is None
and create an empty list accordingly. 这意味着您通常应该默认将
path
设置为None
,然后在调用该函数时,检查path is None
是否path is None
并相应地创建一个空列表。
max_path = path
...
candidate_path = find_longest_path_from(graph, node, path)
...
I've updated the names of your variables from paths
to max_path
and newpaths
to candidate_path
. 我已经将变量的名称从
paths
更新为max_path
,将newpaths
为candidate_path
。 These were confusing variable names because they referred to the plural of path -- implying that the value they stored consisted of multiple paths -- when in fact they each just held a single path. 这些是令人困惑的变量名称,因为它们引用了复数路径 - 暗示它们存储的值由多条路径组成 - 实际上它们每条只保持一条路径。 I tried to give them more descriptive names.
我试图给他们更多描述性的名字。
nodes = graph.get(start, [])
for node in nodes:
Your code errors out on your example input because the leaf nodes of the graph are not keys in the dict
so graph[start]
would raise a KeyError
when start
is 2
, for instance. 您的示例输入中的代码错误,因为图形的叶节点不是
dict
键,因此例如,当start
为2
, graph[start]
会引发KeyError
。 This handles the case where start
is not a key in graph
by returning an empty list. 这通过返回空列表来处理
start
不是graph
的键的情况。
def find_longest_path(graph):
"""
Returns the longest path in a graph
"""
max_path = []
for node in graph:
candidate_path = find_longest_path_from(graph, node)
if len(candidate_path) > len(max_path):
max_path = candidate_path
return max_path
A method to find the longest path in a graph that iterates over the keys. 一种在图中查找迭代键的最长路径的方法。 This is entirely separate from your algorithmic analysis of
find_longest_path_from
but I wanted to include it. 这与
find_longest_path_from
的算法分析完全不同,但我想包含它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.