简体   繁体   English

在Python中将无向循环图(UCG)转换为有向无环图(DAG)的最快方法?

[英]Fastest way to convert an undirected cyclic graph (UCG) to a directed acyclic graph (DAG) in Python?

Say I have an undirected cyclic graph (UCG).假设我有一个无向循环图(UCG)。 The weights of all edges are 1. Therefore, this UCG can be represented by an adjacency matrix A :所有边的权重为 1。因此,这个 UCG 可以用邻接矩阵A

import numpy as np

A = np.array([[0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1],
              [1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
              [1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1],
              [1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
              [1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0],
              [0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0],
              [1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
              [1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0],
              [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
              [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
              [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]])

To visualize the UCG, we can simply convert it to a networkx.Graph object by为了可视化 UCG,我们可以简单地将其转换为networkx.Graph对象

import networkx as nx

ucg = nx.Graph()
rows, cols = np.where(A == 1)
edges = zip(rows.tolist(), cols.tolist())
ucg.add_edges_from(edges)

And the UCG looks like this: UCG 看起来像这样: 在此处输入图片说明

I colored the nodes in different colors in order to show the "minimum distance".我用不同的颜色为节点着色以显示“最小距离”。 The orange nodes {8, 9, 10} are the starting nodes, the green nodes {0, 1, 2, 3} are the nodes that has the minimum distance of 1 to the starting nodes, and the blue nodes {4, 5, 6, 7} has the minimum distance of 2. Now I want to convert this into a directed acyclic graph (DAG) with the arrows pointing from starting nodes to distance-1 nodes to distance-2 nodes and so on.橙色节点{8, 9, 10}是起始节点,绿色节点{0, 1, 2, 3}是到起始节点的最小距离为 1 的节点,蓝色节点{4, 5, 6, 7}的最小距离为 2。现在我想将其转换为有向无环图 (DAG),箭头指向从起始节点到距离 1 节点再到距离 2 节点等等。 The edges between the nodes with same "minimum distance" are discarded.具有相同“最小距离”的节点之间的边被丢弃。

The expected output is a dictionary that represents the DAG:预期输出是代表 DAG 的字典:

d = {8: {1, 3},
     9: {1, 2},
     10: {0, 2, 3},
     0: {4, 6, 7},
     1: {5, 6, 7},
     2: {4, 5, 6},
     3: {4, 5, 7}}

Similarly, to visualize the DAG, we can convert it into a networkx.DiGraph object by同样,为了可视化 DAG,我们可以将其转换为networkx.DiGraph对象

dag = nx.DiGraph()
dag.add_edges_from([(k, v) for k, vs in d.items() in for v in vs])

And the expected output DAG looks like this:预期输出 DAG 如下所示: 在此处输入图片说明

I want to write an efficient and general code to convert a given UCG with given starting nodes to the corresponding DAG.我想编写一个高效且通用的代码,将具有给定起始节点的给定 UCG 转换为相应的 DAG。

What I have tried我试过的

Clearly, a recursion is called for.显然,需要递归。 My idea is to use a BFS approach to find the 1-distance nodes for each starting nodes, then their 1-distance nodes, and the recursion goes on and on.我的想法是使用 BFS 方法为每个起始节点找到 1-distance 节点,然后是它们的 1-distance 节点,递归继续下去。 All visited nodes are stored in a set prev_starts to avoid going backwards.所有访问过的节点都存储在一个集合prev_starts以避免倒退。 Below is my code下面是我的代码

from collections import defaultdict

def ucg2dag(A, starts):
    """Takes the adjacency matrix of a UCG and the indices of the
    starting nodes, returns the dictionary of a DAG."""

    def recur(starts):
        starts = list(set(starts))
        idxs, nbrs = np.where(A[starts] == 1)
        prev_starts.update(starts)

        # Filter out the neighbors that are previous starts so the
        # arrows do not point backwards
        try:
            idxs, nbrs = zip(*((idx, nbr) for idx, nbr in zip(idxs, nbrs)
                                if nbr not in prev_starts))
        # Terminate if every neighbor is a previous start.
        except:
            return d

        for idx, nbr in zip(idxs, nbrs):
            d[starts[idx]].add(nbr)

        return recur(starts=nbrs)

    prev_starts = set()
    d = defaultdict(set)
    return recur(starts)

Testing my code:测试我的代码:

d = ucg2dag(A, starts={8, 9, 10})
print(d)

Edit: after adding return before recur thanks to @trincot's comment, I am able to get the correct output:编辑:由于@trincot 的评论,在recur之前添加return之后,我能够获得正确的输出:

defaultdict(<class 'set'>, 
            {8: {1, 3}, 
             9: {1, 2}, 
             10: {0, 2, 3}, 
             0: {4, 6, 7}, 
             1: {5, 6, 7}, 
             2: {4, 5, 6}, 
             3: {4, 5, 7}})
%timeit 37.6 µs ± 591 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In reality I have a much larger graph.实际上,我有一个更大的图表。 I want to know if there are any algorithm that is more efficient?我想知道是否有更高效的算法?

You have applied some fixes to your code (partially based on comments), so that now you have working code.您已对代码应用了一些修复(部分基于注释),因此现在您拥有了可运行的代码。

The only few remarks that remain are:剩下的仅有的几句话是:

  • BFS is typically not a recursive algorithm (in contrast to DFS): the recursion you have is a case of tail recursion. BFS 通常不是递归算法(与 DFS 相比):您拥有的递归是尾递归的一种情况。 In that case it can be written as a loop, and you'll avoid the use of the stack.在这种情况下,它可以写成一个循环,并且您将避免使用堆栈。

  • It is a pity you have to look up the edges in an adjacency matrix.遗憾的是,您必须在邻接矩阵中查找边。 It would be better to first convert the adjacency matrix to an adjacency list, unless the graph is really dense.最好先将邻接矩阵转换为邻接表,除非图真的很密集。

  • The output could also be an adjacency list, with an entry for each node, and that way it could be a list of lists instead of a dictionary输出也可以是邻接表,对每个节点的条目,这样,它可能是一个列表的列表,而不是一本字典

  • The repeated conversions of structure using zip may not be the most efficient (I didn't benchmark though)使用zip重复转换结构可能不是最有效的(虽然我没有进行基准测试)

Without using numpy, it could look like this:如果不使用 numpy,它可能如下所示:

def ucg2dag(adj_matrix, starts):
    adj_list = [
        [target for target, is_connected in enumerate(row) if is_connected]
            for row in adj_matrix
    ]

    frontier = starts

    dag = [[] for _ in range(len(adj_list))]

    while frontier:
        for source in frontier:
            dag[source].extend(target for target in adj_list[source] if not target in starts)
        frontier = set(target 
            for source in frontier for target in adj_list[source] if not target in starts
        )
        starts.update(frontier)

    return dag

Example run:示例运行:

adj_matrix = [[0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1],
              [1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
              [1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1],
              [1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
              [1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0],
              [0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0],
              [1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
              [1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0],
              [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
              [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
              [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]]

dag = ucg2dag(adj_matrix, {8, 9, 10})
print(dag)  

The output for the example run:示例运行的输出:

[[4, 6, 7], [5, 6, 7], [4, 5, 6], [4, 5, 7], [], [], [], [], [1, 3], [1, 2], [0, 2, 3]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM