繁体   English   中英

创建“最小连通”的有向无环图

[英]Create "minimally connected" directed acyclic graph

我在 NetworkX 中有一个有向无环简单图。

现在,对于每条边,该边都有一个“源”和一个“目标”。 如果除了这条边之外还有从“源”到“目标”的路径,那么我想删除这条边。

  1. NetworkX 是否有内置的 function 来执行此操作?

我真的不想重新发明轮子。

  1. [可选] 只有在 1 的答案为“否”的情况下,那么实现此目的的最有效算法是什么(对于相当密集的图)?

下面是一个需要清理的 DAG 示例:

  • 节点是:

     ['termsequence', 'maximumdegree', 'emptymultigraph', 'minimum', 'multiset', 'walk', 'nonemptymultigraph', 'euleriantrail', 'nonnullmultigraph', 'cycle', 'loop', 'abwalk', 'endvertices', 'simplegraph', 'vertex', 'multipletrails', 'edge', 'set', 'stroll', 'union', 'trailcondition', 'nullmultigraph', 'trivialmultigraph', 'sequence', 'multiplepaths', 'path', 'degreevertex', 'onedgesonvertices', 'nontrivialmultigraph', 'adjacentedges', 'adjacentvertices', 'simpleedge', 'maximum', 'multipleloops', 'length', 'circuit', 'class', 'euleriangraph', 'incident', 'minimumdegree', 'orderedpair', 'unique', 'closedwalk', 'multipleedges', 'pathcondition', 'multigraph', 'trail']
  • 边缘是:

     [('termsequence', 'endvertices'), ('emptymultigraph', 'nonemptymultigraph'), ('minimum', 'minimumdegree'), ('multiset', 'trailcondition'), ('multiset', 'pathcondition'), ('multiset', 'multigraph'), ('walk', 'length'), ('walk', 'closedwalk'), ('walk', 'abwalk'), ('walk', 'trail'), ('walk', 'endvertices'), ('euleriantrail', 'euleriangraph'), ('loop', 'simplegraph'), ('loop', 'degreevertex'), ('loop', 'simpleedge'), ('loop', 'multipleloops'), ('endvertices', 'abwalk'), ('vertex', 'adjacentvertices'), ('vertex', 'onedgesonvertices'), ('vertex', 'walk'), ('vertex', 'adjacentedges'), ('vertex', 'multipleedges'), ('vertex', 'edge'), ('vertex', 'multipleloops'), ('vertex', 'degreevertex'), ('vertex', 'incident'), ('edge', 'adjacentvertices'), ('edge', 'onedgesonvertices'), ('edge', 'multipleedges'), ('edge', 'simpleedge'), ('edge', 'adjacentedges'), ('edge', 'loop'), ('edge', 'trailcondition'), ('edge', 'pathcondition'), ('edge', 'walk'), ('edge', 'incident'), ('set', 'onedgesonvertices'), ('set', 'edge'), ('union', 'multiplepaths'), ('union', 'multipletrails'), ('trailcondition', 'trail'), ('nullmultigraph', 'nonnullmultigraph'), ('sequence', 'walk'), ('sequence', 'endvertices'), ('path', 'cycle'), ('path', 'multiplepaths'), ('degreevertex', 'maximumdegree'), ('degreevertex', 'minimumdegree'), ('onedgesonvertices', 'multigraph'), ('maximum', 'maximumdegree'), ('circuit', 'euleriangraph'), ('class', 'multiplepaths'), ('class', 'multipletrails'), ('incident', 'adjacentedges'), ('incident', 'degreevertex'), ('incident', 'onedgesonvertices'), ('orderedpair', 'multigraph'), ('closedwalk', 'circuit'), ('closedwalk', 'cycle'), ('closedwalk', 'stroll'), ('pathcondition', 'path'), ('multigraph', 'euleriangraph'), ('multigraph', 'nullmultigraph'), ('multigraph', 'trivialmultigraph'), ('multigraph', 'nontrivialmultigraph'), ('multigraph', 'emptymultigraph'), ('multigraph', 'euleriantrail'), ('multigraph', 'simplegraph'), ('trail', 'path'), ('trail', 'circuit'), ('trail', 'multipletrails')]

我之前的回答专门讨论了是否有一种好方法来测试单个边缘是否冗余的直接问题。

看起来你真的想要一种有效去除所有冗余边缘的方法。 这意味着您想要一次完成所有操作。 这是一个不同的问题,但这是一个答案。 我不相信networkx有内置的东西,但是找到一个可行的算法并不困难。

这个想法是因为它是一个DAG,所以有一些节点没有外边缘。 从他们开始并处理它们。 处理完所有内容后,其父母的子集中没有未经处理的子女。 通过那些父母。 重复。 在每个阶段,未处理节点集是DAG,我们正在处理该DAG的“终端节点”。 保证完成(如果原始网络是有限的)。

在实现中,每当我们处理节点时,我们首先检查是否有任何子节点也是间接后代。 如果是,请删除边缘。 如果没有,请保留。 当处理所有子项时,我们通过将其所有后代添加到父项的间接后代集来更新其父项的信息。 如果处理了父项的所有子项,我们现在将它添加到列表中以供下一次迭代。

import networkx as nx
from collections import defaultdict

def remove_redundant_edges(G):
    processed_child_count = defaultdict(int)  #when all of a nodes children are processed, we'll add it to nodes_to_process
    descendants = defaultdict(set)            #all descendants of a node (including children)
    out_degree = {node:G.out_degree(node) for node in G.nodes_iter()}
    nodes_to_process = [node for node in G.nodes_iter() if out_degree[node]==0] #initially it's all nodes without children
    while nodes_to_process:
        next_nodes = []
        for node in nodes_to_process:
            '''when we enter this loop, the descendants of a node are known, except for direct children.'''
            for child in G.neighbors(node):
                if child in descendants[node]:  #if the child is already an indirect descendant, delete the edge
                    G.remove_edge(node,child)
                else:                                    #otherwise add it to the descendants
                    descendants[node].add(child)
            for predecessor in G.predecessors(node):             #update all parents' indirect descendants
                descendants[predecessor].update(descendants[node])  
                processed_child_count[predecessor]+=1            #we have processed one more child of this parent
                if processed_child_count[predecessor] == out_degree[predecessor]:  #if all children processed, add to list for next iteration.
                    next_nodes.append(predecessor)
        nodes_to_process=next_nodes

测试它:

G=nx.DiGraph()
G.add_nodes_from(['termsequence', 'maximumdegree', 'emptymultigraph', 'minimum', 'multiset', 'walk', 'nonemptymultigraph', 'euleriantrail', 'nonnullmultigraph', 'cycle', 'loop', 'abwalk', 'endvertices', 'simplegraph', 'vertex', 'multipletrails', 'edge', 'set', 'stroll', 'union', 'trailcondition', 'nullmultigraph', 'trivialmultigraph', 'sequence', 'multiplepaths', 'path', 'degreevertex', 'onedgesonvertices', 'nontrivialmultigraph', 'adjacentedges', 'adjacentvertices', 'simpleedge', 'maximum', 'multipleloops', 'length', 'circuit', 'class', 'euleriangraph', 'incident', 'minimumdegree', 'orderedpair', 'unique', 'closedwalk', 'multipleedges', 'pathcondition', 'multigraph', 'trail'])
G.add_edges_from([('termsequence', 'endvertices'), ('emptymultigraph', 'nonemptymultigraph'), ('minimum', 'minimumdegree'), ('multiset', 'trailcondition'), ('multiset', 'pathcondition'), ('multiset', 'multigraph'), ('walk', 'length'), ('walk', 'closedwalk'), ('walk', 'abwalk'), ('walk', 'trail'), ('walk', 'endvertices'), ('euleriantrail', 'euleriangraph'), ('loop', 'simplegraph'), ('loop', 'degreevertex'), ('loop', 'simpleedge'), ('loop', 'multipleloops'), ('endvertices', 'abwalk'), ('vertex', 'adjacentvertices'), ('vertex', 'onedgesonvertices'), ('vertex', 'walk'), ('vertex', 'adjacentedges'), ('vertex', 'multipleedges'), ('vertex', 'edge'), ('vertex', 'multipleloops'), ('vertex', 'degreevertex'), ('vertex', 'incident'), ('edge', 'adjacentvertices'), ('edge', 'onedgesonvertices'), ('edge', 'multipleedges'), ('edge', 'simpleedge'), ('edge', 'adjacentedges'), ('edge', 'loop'), ('edge', 'trailcondition'), ('edge', 'pathcondition'), ('edge', 'walk'), ('edge', 'incident'), ('set', 'onedgesonvertices'), ('set', 'edge'), ('union', 'multiplepaths'), ('union', 'multipletrails'), ('trailcondition', 'trail'), ('nullmultigraph', 'nonnullmultigraph'), ('sequence', 'walk'), ('sequence', 'endvertices'), ('path', 'cycle'), ('path', 'multiplepaths'), ('degreevertex', 'maximumdegree'), ('degreevertex', 'minimumdegree'), ('onedgesonvertices', 'multigraph'), ('maximum', 'maximumdegree'), ('circuit', 'euleriangraph'), ('class', 'multiplepaths'), ('class', 'multipletrails'), ('incident', 'adjacentedges'), ('incident', 'degreevertex'), ('incident', 'onedgesonvertices'), ('orderedpair', 'multigraph'), ('closedwalk', 'circuit'), ('closedwalk', 'cycle'), ('closedwalk', 'stroll'), ('pathcondition', 'path'), ('multigraph', 'euleriangraph'), ('multigraph', 'nullmultigraph'), ('multigraph', 'trivialmultigraph'), ('multigraph', 'nontrivialmultigraph'), ('multigraph', 'emptymultigraph'), ('multigraph', 'euleriantrail'), ('multigraph', 'simplegraph'), ('trail', 'path'), ('trail', 'circuit'), ('trail', 'multipletrails')])

print G.size()
>71
print G.order()
>47
descendants = {}  #for testing below
for node in G.nodes():
    descendants[node] = nx.descendants(G,node)

remove_redundant_edges(G)  #this removes the edges

print G.size()  #lots of edges gone
>56
print G.order() #no nodes changed.
>47
newdescendants = {}  #for comparison with above
for node in G.nodes():
    newdescendants[node] = nx.descendants(G,node)

for node in G.nodes():  
    if descendants[node] != newdescendants[node]:
        print 'descendants changed!!'   #all nodes have the same descendants
    for child in G.neighbors(node):  
        if len(list(nx.all_simple_paths(G,node, child)))>1:
            print 'bad edge'  #no alternate path exists from a node to its child.

这将是有效的:它必须在开始时处理每个节点以查看它是否是“结束”节点。 然后它处理到达那些边缘的每个边缘并检查是否已经处理了该父节点的所有子节点。 然后它看着那些父母和重复。

因此,它将处理每个边缘一次(包括扫视父节点),并且每个顶点将在开头处理一次,然后处理一次。

这是一个简单的通用算法。 该算法可以向前或向后运行。 这和Joel的答案基本上是双重的 - 他的向后跑,这是向前跑的:

def remove_redundant_edges(G):
    """
    Remove redundant edges from a DAG using networkx (nx).
    An edge is redundant if there is an alternate path
    from its start node to its destination node.

    This algorithm could work front to back, or back to front.
    We choose to work front to back.

    The main persistent variable (in addition to the graph
    itself) is indirect_pred_dict, which is a dictionary with
    one entry per graph node.  Each entry is a set of indirect
    predecessors of this node.

    The algorithmic complexity of the code on a worst-case
    fully-connected graph is O(V**3), where V is the number
    of nodes.
    """

    indirect_pred_dict = collections.defaultdict(set)
    for node in nx.topological_sort(G):
        indirect_pred = indirect_pred_dict[node]
        direct_pred = G.predecessors(node)
        for pred in direct_pred:
            if pred in indirect_pred:
                G.remove_edge(pred, node)
        indirect_pred.update(direct_pred)
        for succ in G.successors(node):
            indirect_pred_dict[succ] |= indirect_pred

复杂性分析和大O优化

对于最小连通图 ,其中每个节点仅连接到单个边,复杂度为O(V+E) 然而,即使对于简单的线性图(其中每个节点具有传入边缘和传出边缘),复杂度为O(V*E) ,并且对于最大连通图 (最坏情况,每个节点连接的情况)对于图上的每个下游节点,复杂度为O(V**3) 对于这种情况,操作数遵循序列A000292 ,即n * (n+1) * (n+2) / 6 ,其中n是节点数(V)减去3。

根据图表的形状,您可以执行其他优化。 这是一个包含一些不同优化器的版本,可以显着降低某些类型图形的复杂性和运行时间:

def remove_redundant_edges(G, optimize_dense=True, optimize_chains=True,
                              optimize_tree=False,  optimize_funnel=False):
    """
    Remove redundant edges from a DAG using networkx (nx).
    An edge is redundant if there is an alternate path
    from its start node to its destination node.

    This algorithm could work equally well front to back,
    or back to front. We choose to work front to back.

    The main persistent variable (in addition to the graph
    itself) is indirect_pred_dict, which is a dictionary with
    one entry per graph node.  Each entry is a set of indirect
    predecessors of this node.

    The main processing algorithm uses this dictionary to
    iteratively calculate indirect predecessors and direct
    predecessors for every node, and prune the direct
    predecessors edges if they are also accessible indirectly.
    The algorithmic complexity is O(V**3), where V is the
    number of nodes in the graph.

    There are also several graph shape-specific optimizations
    provided.  These optimizations could actually increase
    run-times, especially for small graphs that are not amenable
    to the optimizations, so if your execution time is slow,
    you should test different optimization combinations.

    But for the right graph shape, these optimizations can
    provide dramatic improvements.  For the fully connected
    graph (which is worst-case), optimize_dense reduces the
    algorithmic complexity from O(V**3) to O(V**2).

    For a completely linear graph, any of the optimize_tree,
    optimize_chains, or optimize_funnel options would decrease
    complexity from O(V**2) to O(V).

    If the optimize_dense option is set to True, then an
    optimization phase is before the main algorithm.  This
    optimization phase works by looking for matches between
    each node's successors and that same node's successor's
    successors (by only looking one level ahead at a time).

    If the optimize_tree option is set true, then a phase is
    run that will optimize trees by working right-to-left and
    recursively removing leaf nodes with a single predecessor.
    This will also optimize linear graphs, which are degenerate
    trees.

    If the optimize_funnel option is set true, then funnels
    (inverted trees) will be optimized.

    If the optimize_chains option is set true, then chains
    (linear sections) will be optimized by sharing the
    indirect_pred_dict sets.  This works because Python
    checks to see if two sets are the same instance before
    combining them.

    For a completely linear graph, optimize_funnel or optimize_tree
    execute more quickly than optimize_chains.  Nonetheless,
    optimize_chains option is enabled by default, because
    it is a balanced algorithm that works in more cases than
    the other two.
    """

    ordered = nx.topological_sort(G)

    if optimize_dense:
        succs= dict((node, set(G.successors(node))) for node in ordered)
        for node in ordered:
            my_succs = succs.pop(node)
            kill = set()
            while my_succs:
                succ = my_succs.pop()
                if succ not in kill:
                    check = succs[succ]
                    kill.update(x for x in my_succs if x in check)
            for succ in kill:
                G.remove_edge(node, succ)

    indirect_pred_dict = dict((node, set()) for node in ordered)

    if optimize_tree:
        remaining_nodes = set(ordered)
        for node in reversed(ordered):
            if G.in_degree(node) == 1:
                if not (set(G.successors(node)) & remaining_nodes):
                    remaining_nodes.remove(node)
        ordered = [node for node in ordered if node in remaining_nodes]

    if optimize_funnel:
        remaining_nodes = set(ordered)
        for node in ordered:
            if G.out_degree(node) == 1:
                if not (set(G.predecessors(node)) & remaining_nodes):
                    remaining_nodes.remove(node)
        ordered = [node for node in ordered if node in remaining_nodes]

    if optimize_chains:
        # This relies on Python optimizing the set |= operation
        # by seeing if the objects are identical.
        for node in ordered:
            succs = G.successors(node)
            if len(succs) == 1 and len(G.predecessors(succs[0])) == 1:
                indirect_pred_dict[succs[0]] = indirect_pred_dict[node]

    for node in ordered:
        indirect_pred = indirect_pred_dict.pop(node)
        direct_pred = G.predecessors(node)
        for pred in direct_pred:
            if pred in indirect_pred:
                G.remove_edge(pred, node)
        indirect_pred.update(direct_pred)
        for succ in G.successors(node):
            indirect_pred_dict[succ] |= indirect_pred

我还没有分析是否有可能构建一个密集但非最大连接的图,在启用optimize_dense选项的情况下执行复杂度大于O(V**2) ,但我没有理由, 先验 ,相信这是不可能的。 优化最适合于最大连接图,并且不会做任何事情,例如,在每个节点与其grandchilden而不是其子节点共享后继的情况下,我没有分析这种情况的运行时。

示例测试平台

我已经删除了基本算法的代码,并添加了记录最坏情况路径所需的操作数的工具,以及生成最大连接图的示例测试生成器。

import collections
import networkx as nx

def makegraph(numnodes):
    """
    Make a fully-connected graph given a number of nodes
    """
    edges = []
    for i in range(numnodes):
        for j in range(i+1, numnodes):
            edges.append((i, j))
    return nx.DiGraph(edges)

def remove_redundant_edges(G):
    ops = 0
    indirect_pred_dict = collections.defaultdict(set)
    for node in nx.topological_sort(G):
        indirect_pred = indirect_pred_dict[node]
        direct_pred = G.predecessors(node)
        for pred in direct_pred:
            if pred in indirect_pred:
                G.remove_edge(pred, node)
        indirect_pred.update(direct_pred)
        for succ in G.successors(node):
            indirect_pred_dict[succ] |= indirect_pred
            ops += len(indirect_pred)
    return ops

def test_1(f, numnodes):
    G = makegraph(numnodes)
    e1 = nx.number_of_edges(G)
    ops = f(G)
    e2 = nx.number_of_edges(G)
    return ops, e1, e2

for numnodes in range(30):
    a = test_1(remove_redundant_edges, numnodes)
    print numnodes, a[0]

您可以使用nx.transitive_reduction(G)

这是文档中的示例。

DG = nx.DiGraph([(1, 2), (2, 3), (1, 3)])
TR = nx.transitive_reduction(DG)
list(TR.edges)  # [(1, 2), (2, 3)]

是。

你想使用all_simple_paths [ documentation ](它提供了两者之间的所有简单路径的生成器)。 一找到第二个就退出,这样它就不会计算所有这些。

定义完成后,您需要查看每条边,如果从源到目标的路径不止一条,则删除边。

def multiple_paths(G,source,target):
    '''returns True if there are multiple_paths, False otherwise'''
    path_generator = nx.all_simple_paths(G, source=source, target=target)
    counter = 0
    for path in path_generator: #test to see if there are multiple paths
        counter += 1
        if counter >1:  
            break  #instead of breaking, could have return True
    if counter >1:  #counter == 2
        return True
    else:  #counter == 0 or 1
        return False

import networkx as nx
G=nx.DiGraph()
G.add_edges_from([(0,1), (1,2), (1,3), (0,3), (2,3)])
multiple_paths(G,0,1)
> False
multiple_paths(G,0,2) 
> False
multiple_paths(G,0,3)
> True

for edge in G.edges_iter():  #let's do what you're trying to do
    if multiple_paths(G, edge[0], edge[1]):
        G.remove_edge(edge[0],edge[1])

G.edges()
> [(0, 1), (1, 2), (2, 3)]

注意,您也可以删除边缘,然后运行has_path以查看是否还有路径。 如果没有,则将边缘添加回来。

import networkx as nx
G=nx.DiGraph()
G.add_edges_from([(0,1), (1,2), (1,3), (0,3), (2,3)])
for edge in G.edges_iter():
    G.remove_edge(edge[0],edge[1])
    if not nx.has_path(G,edge[0],edge[1]):
        G.add_edge(edge[0],edge[1])

G.edges()
> [(0, 1), (1, 2), (2, 3)]

如果有任何边缘数据,你会要小心,我不喜欢删除边缘然后再添加它的可能性 - 它开启了一些难以找到错误的机会。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM