Python中的尋路效率

Question

我編寫了一些代碼，以找到樹狀流網絡中給定范圍上游的所有路徑。 例如，如果我代表以下網絡：

     4 -- 5 -- 8
    / 
   2 --- 6 - 9 -- 10
  /           \ 
 1              -- 11
  \
   3 ----7

作為一組父子對：

{(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}

它將返回節點上游的所有路徑，例如：

get_paths(h, 1)  # edited, had 11 instead of 1 in before
[[Reach(2), Reach(6), Reach(9), Reach(11)], [Reach(2), Reach(6), Reach(9), Reach(10)], [Reach(2), Reach(4), Reach(5), Reach(8)], [Reach(3), Reach(7)]]

該代碼包含在下面。

我的問題是：我將此方法應用於非常大的區域（例如，新英格蘭）中的每個范圍，任何給定范圍都可能具有數百萬條路徑。 可能沒有辦法避免這是一個很長的操作，但是有沒有一種pythonic的方法來執行此操作，以使每次運行都不會生成全新的路徑？

例如，如果我運行get_paths（h，2）並找到了2上游的所有路徑，我以后是否可以運行get_paths（h，1）而不必追溯2中的所有路徑？

import collections

# Object representing a stream reach.  Used to construct a hierarchy for accumulation function
class Reach(object):
    def __init__(self):
        self.name = None
        self.ds = None
        self.us = set()

    def __repr__(self):
        return "Reach({})".format(self.name)


def build_hierarchy(flows):
    hierarchy = collections.defaultdict(lambda: Reach())
    for reach_id, parent in flows:
        if reach_id:
            hierarchy[reach_id].name = reach_id
            hierarchy[parent].name = parent
            hierarchy[reach_id].ds = hierarchy[parent]
            hierarchy[parent].us.add(hierarchy[reach_id])
    return hierarchy

def get_paths(h, start_node):
    def go_up(n):
        if not h[n].us:
            paths.append(current_path[:])
        for us in h[n].us:
            current_path.append(us)
            go_up(us.name)
        if current_path:
            current_path.pop()
    paths = []
    current_path = []
    go_up(start_node)
    return paths

test_tree = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
h = build_hierarchy(test_tree)
p = get_paths(h, 1)

編輯：幾周前，我問了一個類似的問題，關於在網絡中查找“所有”上游數據，並收到了一個非常快的出色答案：

class Node(object):

    def __init__(self):
        self.name = None
        self.parent = None
        self.children = set()
        self._upstream = set()

    def __repr__(self):
        return "Node({})".format(self.name)

    @property
    def upstream(self):
        if self._upstream:
            return self._upstream
        else:
            for child in self.children:
                self._upstream.add(child)
                self._upstream |= child.upstream
            return self._upstream

import collections

edges = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
nodes = collections.defaultdict(lambda: Node())

for node, parent in edges:
    nodes[node].name = node
    nodes[parent].name = parent
    nodes[node].parent = nodes[parent]
    nodes[parent].children.add(nodes[node])

我注意到def upper（）：此代碼的一部分按順序添加了上游節點，但是由於它是一個迭代函數，所以找不到將它們附加到單個列表的好方法。 也許有一種方法可以修改此代碼以保留順序。

Answer 1

是的，您可以這樣做。 我不確定您的限制是什么； 但是，這應該可以使您走上正確的道路。 最差的運行時間是O（| E | + | V |），唯一的區別是在p.dfsh ，我們正在緩存先前評估的路徑，而不是p.dfs 。

這將增加額外的空間開銷，因此請注意這一折衷–您將節省許多迭代（取決於您的數據集），而無論如何都將占用更多的內存。 不幸的是，緩存並不能改善增長順序，只能改善實際運行時間：

points = set([
    (11, 9),
    (10, 9), 
    (9, 6), 
    (6, 2), 
    (8, 5), 
    (5, 4), 
    (4, 2), 
    (2, 1), 
    (3, 1),
    (7, 3),
])

class PathFinder(object):

    def __init__(self, points):
        self.graph  = self._make_graph(points)
        self.hierarchy = {}

    def _make_graph(self, points):
        graph = {}
        for p in points:
            p0, p1 = p[0], p[1]
            less, more = min(p), max(p)

            if less not in graph:
                graph[less] = set([more])
            else:
                graph[less].add(more)

        return graph

    def dfs(self, start):
        visited = set()
        stack = [start]

        _count = 0
        while stack:
            _count += 1
            vertex = stack.pop()
            if vertex not in visited:
                visited.add(vertex)
                if vertex in self.graph:
                    stack.extend(v for v in self.graph[vertex])

        print "Start: {s} | Count: {c} |".format(c=_count, s=start),
        return visited

    def dfsh(self, start):
        visited = set()
        stack = [start]

        _count = 0
        while stack:
            _count += 1

            vertex = stack.pop()
            if vertex not in visited:
                if vertex in self.hierarchy:
                    visited.update(self.hierarchy[vertex])
                else:
                    visited.add(vertex)
                    if vertex in self.graph:
                        stack.extend([v for v in self.graph[vertex]])
        self.hierarchy[start] = visited

        print "Start: {s} | Count: {c} |".format(c=_count, s=start),
        return visited

p = PathFinder(points)
print p.dfsh(1)
print p.dfsh(2)
print p.dfsh(9)
print p.dfsh(6)
print p.dfsh(2)
print 
print p.dfs(1)
print p.dfs(2)
print p.dfs(9)
print p.dfs(6)
print p.dfs(2)

p.dfsh的輸出如下：

Start: 1 | Count: 11 | set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])
Start: 9 | Count: 3 | set([9, 10, 11])
Start: 6 | Count: 2 | set([9, 10, 11, 6])
Start: 2 | Count: 1 | set([2, 4, 5, 6, 8, 9, 10, 11])

常規p.dfs的輸出為：

Start: 1 | Count: 11 | set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])
Start: 9 | Count: 3 | set([9, 10, 11])
Start: 6 | Count: 4 | set([9, 10, 11, 6])
Start: 2 | Count: 8 | set([2, 4, 5, 6, 8, 9, 10, 11])

如您所見，我執行了DFS，但是在合理范圍內，我跟蹤以前的迭代。 我不想跟蹤所有可能的先前路徑，因為如果在大型數據集上使用它，則會占用大量的內存。

在輸出中，您可以看到p.dfsh(2)的迭代計數從8變為1。同樣，由於先前對p.dfsh(9) p.dfsh(6)的計算， p.dfsh(6)的計數也下降到了2。 p.dfsh(9) 。 與標准DFS相比，這是對運行時的適度改進，尤其是在大型數據集上。

Answer 2

當然，假設您有足夠的內存來存儲來自每個節點的所有路徑，則可以對在該答案中收到的代碼進行直接修改：

class Reach(object):
    def __init__(self):
        self.name = None
        self.ds = None
        self.us = set()
        self._paths = []

    def __repr__(self):
        return "Reach({})".format(self.name)

    @property
    def paths(self):
        if not self._paths:
            for child in self.us:
                if child.paths:
                    self._paths.extend([child] + path for path in child.paths)
                else:
                    self._paths.append([child])
        return self._paths

請注意，對於大約20,000個訪問范圍，該方法所需的內存約為千兆字節。 假設所需的內存通常是平衡的樹，則所需的內存為O（n ^ 2） ，其中n是內存的總數。 取決於您的系統，這將是4-8 GiB的20,000個范圍。 在計算完h[1]的路徑后，任何節點的所需時間均為O（1） 。

Python中的尋路效率

問題描述

2 個解決方案

解決方案1
3 已采納 2015-01-28 16:46:56

解決方案2
1 2015-01-28 16:56:13

Python中的尋路效率

問題描述

2 個解決方案

解決方案1 3 已采納 2015-01-28 16:46:56

解決方案2 1 2015-01-28 16:56:13

解決方案1
3 已采納 2015-01-28 16:46:56

解決方案2
1 2015-01-28 16:56:13