简体   繁体   English

NetworkX最小生成树对于相同的数据有不同的集群排列?

[英]NetworkX Minimum Spanning Tree has different cluster arrangement with the same data?

I have a large dataset which compares products with a relatedness measure which looks like this:我有一个大型数据集,它将产品与相关性度量进行比较,如下所示:

product1      product2  relatedness
0101          0102      0.047619
0101          0103      0.023810
0101          0104      0.095238
0101          0105      0.214286
0101          0106      0.047619
...           ...       ...

I used the following code to feed the data into the NetworkX graphing tool and produce an MST diagram:我使用以下代码将数据输入 NetworkX 绘图工具并生成 MST 图:

import networkx as nx
import matplotlib.pyplot as plt

products = (data['product1'])
products = list(dict.fromkeys(products))
products = sorted(products)

G = nx.Graph()
G.add_nodes_from(products)
print(G.number_of_nodes())
print(G.nodes())

row = 0
for c in data['product1']:
    p = data['product2'][row]
    w = data['relatedness'][row]
    if w > 0:
        G.add_edge(c,p, weight=w, with_labels=True)
    row = row + 1

nx.draw(nx.minimum_spanning_tree(G), with_labels=True)
plt.show()

The resulting diagram looks like this: https://i.imgur.com/pBbcPGc.jpg生成的图表如下所示: https://i.imgur.com/pBbcPGc.jpg

However, when I re-run the code, with the same data and no modifications, the arrangement of the clusters appears to change, so it then looks different, example here: https://i.imgur.com/4phvFGz.jpg , second example here: https://i.imgur.com/f2YepVx.jpg .但是,当我重新运行代码时,使用相同的数据且未进行任何修改,集群的排列似乎发生了变化,因此看起来有所不同,例如: https://i.imgur.com/4phvFGz.jpg ,这里的第二个例子: https://i.imgur.com/f2YepVx.jpg The clusters, edges, and weights do not appear to be changing, but the arrangement of them on the graph space is changing each time.簇、边和权重似乎没有变化,但它们在图空间上的排列每次都在变化。

What causes the arrangement of the nodes to change each time without any changes to the code or data?是什么导致节点的排列每次都发生变化,而代码或数据却没有任何变化? How can I re-write this code to produce a network diagram with approximately the same arrangement of nodes and edges for the same data each time?我如何重新编写此代码以生成一个网络图,每次对于相同的数据,节点和边的排列大致相同?

The nx.draw method uses by default the spring_layout (link to the doc) . nx.draw方法默认使用spring_layout (链接到文档) This layout implements the Fruchterman-Reingold force-directed algorithm which starts with random initial positions.此布局实现了Fruchterman-Reingold 力导向算法,该算法从随机初始位置开始。 This is this layout effect that you witness in your repetitive trials.这是您在重复试验中看到的这种布局效果。

If you want to "fix" the positions, then you should explicitely call the spring_layout function and specify the initial positions in the pos argument.如果要“修复”位置,则应显式调用spring_layout function 并在pos参数中指定初始位置。

Assign G = nx.minimum_spanning_tree(G) for purpose of clarity.为清楚起见,分配G = nx.minimum_spanning_tree(G) Then然后

nx.draw(G, with_labels=True)

is equivalent to相当于

pos = nx.spring_layout(G)
nx.draw(G, pos=pos, with_labels=True)

Since you don't like pos to be calculated randomly every time you run your script, the only way to keep your pos stable is to store it once and retrieve from file after each rerun.由于您不希望每次运行脚本时都随机计算pos ,因此保持pos稳定的唯一方法是存储一次并在每次重新运行后从文件中检索。 You can put this script to calculate pos in an improved manner before nx.draw(G, pos=pos, with_labels=True) :您可以在nx.draw(G, pos=pos, with_labels=True)之前将此脚本以改进的方式计算pos

import os, json

def store(pos):
    #form of dictionary to be stored dictionary retrieved
    return {k: v.tolist() for k, v in pos.items()}
def retrieve(pos):
    #form of dictionary to be retrieved
    return {float(k): v for k, v in pos.items()}

if 'nodes.txt' in os.listdir():
    json_file = open('pos.txt').read()
    pos = retrieve(json.loads(json_file)) #retrieving dictionary from file
    print('retrieve', pos)
else:
    with open('pos.txt', 'w') as outfile:
        pos = nx.spring_layout(new_G) #calculates pos
        print('store', pos)
        json.dump(store(pos), outfile, indent=4) #records pos dictionary into file

This is an ugly solution because it depends unconditionally of data types used in pos dictionary.这是一个丑陋的解决方案,因为它无条件地依赖于pos字典中使用的数据类型。 It worked for me, but you might to define your custom ones used in store and retrieve它对我有用,但您可以定义在storeretrieve中使用的自定义

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM