简体   繁体   English

在Gephi中打开之前,在Networkx write_graphml中添加属性

[英]Add attributes in Networkx write_graphml before opening in Gephi

I have a dataframe consisting of possible network connections in the format df = pd.DataFrame(["A", "B", "Count", "some_attribute"]) . 我有一个由df = pd.DataFrame(["A", "B", "Count", "some_attribute"])格式的可能网络连接组成的数据帧。 This dataframe represents connections like this: 此数据框表示如下连接:

  • A has a connection with B A与B有联系
  • This connection occurred "Count" times 此连接发生“计数”次
  • This connection has a specific attribute (ie a specific type of contact) 此连接具有特定属性(即特定类型的联系人)

I want to export this Dataframe to the graphml format. 我想将此Dataframe导出为graphml格式。 It works fine using the following code: 使用以下代码可以正常工作:

import networkx as nx
G = nx.Graph()
G.add_weighted_edges_from(df[["A", "B", "Count"]].values)
nx.write_graphml(G, "my_graph.graphml")

This code results in a graphml file with the correct graph, which I can use with Gephi. 这段代码会生成一个带有正确图形的graphml文件,我可以将它与Gephi一起使用。 Now I want to add an attribute: 现在我要添加一个属性:

G = nx.Graph()
G.add_weighted_edges_from(df[["A", "B", "Count"]].values, attr=df["some_attribute"].values)
nx.write_graphml(G, "my_graph.graphml")

Whenever I try to add attributes in this code, it becomes impossible to write it to a graphml file. 每当我尝试在此代码中添加属性时,就无法将其写入graphml文件。 With this code, I get the following error message: 使用此代码,我收到以下错误消息:

NetworkXError: GraphML writer does not support <class 'numpy.ndarray'> as data values.

I found related articles (like this one), but it didn't provide any solution for this problem. 我找到了相关的文章(比如这篇文章),但它没有为这个问题提供任何解决方案。 Does anyone have a solution for adding attributes to a graphml file using networkx so I can use them in Gephi? 有没有人有使用networkx向graphml文件添加属性的解决方案,所以我可以在Gephi中使用它们?

Assuming the random DataFrame: 假设随机DataFrame:

import pandas as pd
df = pd.DataFrame({'A': [0,1,2,0,0],
                   'B': [1,2,3,2,3],
                   'Count': [1,2,5,1,1],
                   'some_attribute': ['red','blue','red','blue','red']})

    A   B   Count  some_attribute
0   0   1   1   red
1   1   2   2   blue
2   2   3   5   red
3   0   2   1   blue
4   0   3   1   red

Following the code from above to instantiate a Graph : 按照上面的代码实例化一个Graph

import networkx as nx    
G = nx.Graph()
G.add_weighted_edges_from(df[["A","B", "Count"]].values, attr=df["some_attribute"].values)

when inspecting an edge, it appears that the numpy array, df['some_attribute'].values , gets assigned as an attribute to each edge: 在检查边时,看起来numpy数组df['some_attribute'].values被赋值为每个边的属性:

print (G.edge[0][1])
print (G.edge[2][3])
{'attr': array(['red', 'blue', 'red', 'blue', 'red'], dtype=object), 'weight': 1}
{'attr': array(['red', 'blue', 'red', 'blue', 'red'], dtype=object), 'weight': 5}

If I understand your intent correctly, I'm assuming you want each edge's attribute to correspond to the df['some_attribute'] column. 如果我理解你的意图正确,我假设你希望每个边的属性对应于df['some_attribute']列。

You may find it easier to create your Graph using nx.from_pandas_dataframe() , especially since you already have data formatted in a DataFrame object. 您可能会发现使用nx.from_pandas_dataframe()创建Graph更容易,特别是因为您已经在DataFrame对象中格式化了数据。

G = nx.from_pandas_dataframe(df, 'A', 'B', ['Count', 'some_attribute'])

print (G.edge[0][1])
print (G.edge[2][3])
{'Count': 1, 'some_attribute': 'red'}
{'Count': 5, 'some_attribute': 'red'}

writing to file was no problem: 写入文件没问题:

nx.write_graphml(G,"my_graph.graphml")

except, I'm not a regular Gephi user so there may be another way to solve the following. 除了,我不是一个普通的Gephi用户,所以可能有另一种方法来解决以下问题。 When I loaded the file with 'Count' as the edge attribute, the Gephi graph didn't recognize edge weights by default. 当我使用'Count'作为边缘属性加载文件时,Gephi图形默认不识别边缘权重。 So I changed the column name from 'Count' to 'weight' and saw the following when I loaded into Gephi: 所以我将列名从'Count'更改为'weight'并在加载到Gephi时看到以下内容:

df.columns=['A', 'B', 'weight', 'some_attribute']
G = nx.from_pandas_dataframe(df, 'A', 'B', ['weight', 'some_attribute'])
nx.write_graphml(G,"my_graph.graphml")

在此输入图像描述

Hope this helps and that I understood your question correctly. 希望这有助于我正确理解你的问题。

Edit 编辑

Per Corley's comment above, you can use the following if you choose to use add_edges_from . 根据Per Corley上面的评论,如果您选择使用add_edges_from ,则可以使用以下add_edges_from

G.add_edges_from([(u,v,{'weight': w, 'attr': a}) for u,v,w,a in df[['A', 'B', 'Count', 'some_attribute']].values ])

There is no significant performance gain, however I find from_pandas_dataframe more readable. 没有显着的性能提升,但我发现from_pandas_dataframe更具可读性。

import numpy as np

df = pd.DataFrame({'A': np.arange(0,1000000),
                   'B': np.arange(1,1000001),
                   'Count': np.random.choice(range(10), 1000000, replace=True),
                   'some_attribute': np.random.choice(['red','blue'], 1000000, replace=True,)})

%%timeit
G = nx.Graph()
G.add_edges_from([(u,v,{'weight': w, 'attr': a}) for u,v,w,a in df[['A', 'B', 'Count', 'some_attribute']].values ])

1 loop, best of 3: 4.23 s per loop

%%timeit
G = nx.Graph()
G = nx.from_pandas_dataframe(df, 'A', 'B', ['Count', 'some_attribute'])

1 loop, best of 3: 3.93 s per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM