繁体   English   中英

dict中元素的Python组合

[英]Python combinations of elements in dict

我有一堆像下面这样的字典(有些可能很大):

V = {
    0: [823, 832, 1151, 1752, 2548, 3036],
    823: [832, 1151, 1752, 2548, 3036, 3551],
    832: [1151, 1752, 2548, 3036, 3551],
    1151: [1752, 2548, 3036, 3551],
    1752: [2548, 3036, 3551, 4622],
    2548: [3036, 3551, 4622],
    3036: [3551, 4622, 5936, 6440],
    3551: [4622, 5936, 6440],
    4622: [5936, 6440, 9001],
    5936: [6440, 9001],
    6440: [9001],
    9001: []
}

dict 表示帮助导出所有可能路径(它们是路径)的基本规则。 路径是上述整数的序列。

dict 值列表中的每个值也是一个键。

我如何确定所有可能的路径,例如:

[3036, 4622, 9001] 是有效路径,

但是 [3036, 9001] 不是,原因是 3036 后面必须跟 V[3036] 中的元素之一。 并且每个组合都必须包含一个兼容的序列,并且每个序列都必须以 9001 结尾,也就是说,要到达 9001,必须经过 6440、5936 或 4622。

每个序列也必须从 V[0] 中的一个点开始。

我尝试了两件事:

  1. 我首先使用 itertools.product 导出所有路径,然后过滤掉无效路径,但是对于大多数 dicts,itertools.product 组合的数量太大了。
  2. 蒙特卡洛模拟,但循环数以百万计,无法保证捕获所有路径。

看起来像一个简单的 DFS。 由于该图似乎是有向的(每个节点都有其后继者的数量大于该节点的数量),因此您甚至无需小心避免循环。

>>> def dfs(graph, start, end):
...     if start == end:
...         return [[end]]
...     return [[start] + result for s in graph[start] for result in dfs(graph, s, end)]
...
>>> dfs(V, 0, 9001)
[[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 4622, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 4622, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3551, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 4622, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 4622, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 5936, 9001], [0, 823, 832, 1151, 1752, 3036, 3551, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 3036, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 4622, 9001], [0, 823, 832, 1151, 1752, 3036, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3036, 5936, 9001], [0, 823, 832, 1151, 1752, 3036, 6440, 9001], [0, 823, 832, 1151, 1752, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 3551, 4622, 9001], [0, 823, 832, 1151, 1752, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 3551, 5936, 9001], [0, 823, 832, 1151, 1752, 3551, 6440, 9001], [0, 823, 832, 1151, 1752, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 1752, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 4622, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 4622, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 5936, 9001], [0, 823, 832, 1151, 2548, 3036, 3551, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 4622, 5936, 9001], [0, 823, 832, 1151, 2548, 3036, 4622, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 4622, 9001], [0, 823, 832, 1151, 2548, 3036, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3036, 5936, 9001], [0, 823, 832, 1151, 2548, 3036, 6440, 9001], [0, 823, 832, 1151, 2548, 3551, 4622, 5936, 6440, 9001], [0, 823, 832, 1151, 2548, 3551, 4622, 5936, 9001], ...]

如果上述函数在您的一个 dicts 上永远旋转,那么是时候修改关于被定向图的假设了。

您可以将字典视为邻接列表。 您可以使用 vanilla Python(如 Samwise 的答案),但如果图表有循环,他们的答案将不起作用。

networkx公开了一种查找所需路径的方法,因此我们可以使用它。 这个函数返回一个生成器,这意味着它不会一次将所有路径加载到内存中(尽管如果你想使用list()可以这样做——但如果图形很大,你可能会耗尽内存):

import networkx as nx

graph = nx.DiGraph(V)
for path in nx.all_simple_paths(graph, 0, 9001):
    print(path)

输出的第一行和最后三行:

[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 6440, 9001]
[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 9001]
[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 6440, 9001]
... [755 more lines]
[0, 3036, 5936, 6440, 9001]
[0, 3036, 5936, 9001]
[0, 3036, 6440, 9001]

非递归深度优先搜索生成器函数

  • 解决方案比其他两个解决方案(即networkx,dfs)更快
  • 更新了 KellyBundy 在评论中的观察,这使得代码稍微快了一点。

代码

def dfs_stack(graph, start, goal):
    '''
        Depth First Search for all paths from start to goal
    '''
    # Init stack to path with just starting vertex
    stack = [[start]]
    
    while stack:
        # Expand path at end of stack
        path = stack.pop()
        
        if path[-1] == goal:
            yield path                # reached goal
        else:
            # Add all paths of vertex to stack
            for start in graph[path[-1]]:
                stack.append(path + [start])

用法

# Use list on generator to obtain all paths
paths = list(dfs_stack(V, 0, 9001))

print(paths [:3])    # First 3 paths
# Output: [0, 3036, 6440, 9001], [0, 3036, 5936, 9001], [0, 3036, 5936, 6440, 9001]]

print(paths [-3:])    # Last 3 paths
# Output: [[0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 6440, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 9001], [0, 823, 832, 1151, 1752, 2548, 3036, 3551, 4622, 5936, 6440, 9001]]

时序比较

当前方法的速度是其他两个在 OP 数据上发布的解决方案的两倍多。

Current Approach
    %timeit list(dfs_stack(V, 0, 9001))
    Result: 874 µs ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

DFS function from Samwise solution
    %timeit dfs(V, 0, 9001)
    Result: 2.1 ms ± 91.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Networkx solution from BrokenBenchmark solution
    %%timeit 
    graph = nx.DiGraph(V)
    list(nx.all_simple_paths(graph, 0, 9001))
    Result: 4.83 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Timing from OP (see comments): this solution produces 12 Million paths in less than 20s, 
                               networkx takes in excess of 48s

修改以避免循环

尽管当前图没有循环,但可以进行简单的修改来避免它们。

def dfs_stack_no_cycles(graph, start, goal):
    '''
        Depth First Search for all paths from start to goal
    '''
    graph = {k:set(v) for k, v in graph.items()}
    # Init stack to path with just starting vertex
    stack = [[start]]
    
    while stack:
        # Expand path at end of stack
        path = stack.pop()
        
        if path[-1] == goal:
            yield path                # reached goal
        else:
            # Add all paths of vertex to stack
            for start in graph[path[-1]] - set(path):
                stack.append(path + [start])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM