繪制由 NetworkX Girvan-Newman 算法找到的社區的樹狀圖

Question

用於網絡社區檢測的 Girvan-Newman 算法：

通過逐步從原始圖中移除邊來檢測社區。 該算法會在每一步刪除“最有價值”的邊緣，傳統上是具有最高中介中心性的邊緣。 當圖表分解成碎片時，緊密結合的社區結構暴露出來，結果可以用樹狀圖來描述。

在 NetworkX 中，實現返回集合元組上的迭代器。 第一個元組是由 2 個社區組成的第一個切割，第二個元組是由 3 個社區組成的第二個切割，依此類推，直到最后一個元組具有 n 個單獨節點（樹狀圖的葉子）的 n 個集合。

import networkx as nx

G = nx.path_graph(10)
comp = nx.community.girvan_newman(G)
list(comp)

[({0, 1, 2, 3, 4}, {5, 6, 7, 8, 9}), ({0, 1}, {2, 3, 4}, {5, 6, 7, 8 , 9}), ({0, 1}, {2, 3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4 }, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}) , ({0}, {1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8 }, {9})]

問題是：如何繪制這個樹狀圖？

我看過scipy.cluster.hierarchy.dendrogram但它需要一個“鏈接矩陣”我猜比如由scipy.cluster.hierarchy.linkage創建的scipy.cluster.hierarchy.linkage ，但我不確定我將如何轉換這個列表元組進入這個“鏈接矩陣”。

所以我問如何在有/沒有 SciPy's dendrogram的幫助下繪制這個樹狀dendrogram 。

Answer 1

在@ItamarMushkin 之后，我遵循了@mdml 的答案，並稍作修改並得到了我想要的。 在高層次上，我將 NetworkX 的 Girvan-Newman 迭代器輸出轉換為另一個DiGraph()我最終希望將其視為樹狀圖。 然后我構建Z ，我輸入到scipy.cluster.hierarchy.dendrogram的鏈接矩陣，以scipy.cluster.hierarchy.dendrogram的形式包含每個樹狀圖合並的實際高度。

我必須對@mdml 的回答進行兩項修改：

沒那么重要：我對進入index的節點的元組鍵進行排序
更重要的是：我添加了一個get_merge_height函數，它根據 Girvan-Newman 邊緣去除的輸出順序為每個合並提供其唯一的高度。 否則，兩個節點的所有合並在樹狀圖中將具有相同的高度，在合並兩個節點的下一級中的所有合並以及另一個節點將具有相同的高度，等等。

我知道這里可能有一些多余的迭代，我還沒有考慮優化。

import networkx as nx
from itertools import chain, combinations
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram

# get simulated Graph() and Girvan-Newman communities list
G = nx.path_graph(10)
communities = list(nx.community.girvan_newman(G))

# building initial dict of node_id to each possible subset:
node_id = 0
init_node2community_dict = {node_id: communities[0][0].union(communities[0][1])}
for comm in communities:
    for subset in list(comm):
        if subset not in init_node2community_dict.values():
            node_id += 1
            init_node2community_dict[node_id] = subset

# turning this dictionary to the desired format in @mdml's answer
node_id_to_children = {e: [] for e in init_node2community_dict.keys()}
for node_id1, node_id2 in combinations(init_node2community_dict.keys(), 2):
    for node_id_parent, group in init_node2community_dict.items():
        if len(init_node2community_dict[node_id1].intersection(init_node2community_dict[node_id2])) == 0 and group == init_node2community_dict[node_id1].union(init_node2community_dict[node_id2]):
            node_id_to_children[node_id_parent].append(node_id1)
            node_id_to_children[node_id_parent].append(node_id2)

# also recording node_labels dict for the correct label for dendrogram leaves
node_labels = dict()
for node_id, group in init_node2community_dict.items():
    if len(group) == 1:
        node_labels[node_id] = list(group)[0]
    else:
        node_labels[node_id] = ''

# also needing a subset to rank dict to later know within all k-length merges which came first
subset_rank_dict = dict()
rank = 0
for e in communities[::-1]:
    for p in list(e):
        if tuple(p) not in subset_rank_dict:
            subset_rank_dict[tuple(sorted(p))] = rank
            rank += 1
subset_rank_dict[tuple(sorted(chain.from_iterable(communities[-1])))] = rank

# my function to get a merge height so that it is unique (probably not that efficient)
def get_merge_height(sub):
    sub_tuple = tuple(sorted([node_labels[i] for i in sub]))
    n = len(sub_tuple)
    other_same_len_merges = {k: v for k, v in subset_rank_dict.items() if len(k) == n}
    min_rank, max_rank = min(other_same_len_merges.values()), max(other_same_len_merges.values())
    range = (max_rank-min_rank) if max_rank > min_rank else 1
    return float(len(sub)) + 0.8 * (subset_rank_dict[sub_tuple] - min_rank) / range

# finally using @mdml's magic, slightly modified:
G           = nx.DiGraph(node_id_to_children)
nodes       = G.nodes()
leaves      = set( n for n in nodes if G.out_degree(n) == 0 )
inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ]

# Compute the size of each subtree
subtree = dict( (n, [n]) for n in leaves )
for u in inner_nodes:
    children = set()
    node_list = list(node_id_to_children[u])
    while len(node_list) > 0:
        v = node_list.pop(0)
        children.add( v )
        node_list += node_id_to_children[v]
    subtree[u] = sorted(children & leaves)

inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last

# Construct the linkage matrix
leaves = sorted(leaves)
index  = dict( (tuple([n]), i) for i, n in enumerate(leaves) )
Z = []
k = len(leaves)
for i, n in enumerate(inner_nodes):
    children = node_id_to_children[n]
    x = children[0]
    for y in children[1:]:
        z = tuple(sorted(subtree[x] + subtree[y]))
        i, j = index[tuple(sorted(subtree[x]))], index[tuple(sorted(subtree[y]))]
        Z.append([i, j, get_merge_height(subtree[n]), len(z)]) # <-- float is required by the dendrogram function
        index[z] = k
        subtree[z] = list(z)
        x = z
        k += 1

# dendrogram
plt.figure()
dendrogram(Z, labels=[node_labels[node_id] for node_id in leaves])
plt.savefig('dendrogram.png')

繪制由 NetworkX Girvan-Newman 算法找到的社區的樹狀圖

問題描述

1 個解決方案

解決方案1
4 已采納 2020-01-20 16:44:08

繪制由 NetworkX Girvan-Newman 算法找到的社區的樹狀圖

問題描述

1 個解決方案

解決方案1 4 已采納 2020-01-20 16:44:08

解決方案1
4 已采納 2020-01-20 16:44:08