简体   繁体   English

将具有属性和边的节点从 DataFrame 加载到 NetworkX

[英]Load nodes with attributes and edges from DataFrame to NetworkX

I am new using Python for working with graphs: NetworkX.我是使用 Python 处理图形的新手:NetworkX。 Until now I have used Gephi.到目前为止,我一直在使用 Gephi。 There the standard steps (but not the only possible) are:标准步骤(但不是唯一可能的)是:

  1. Load the nodes informations from a table/spreadsheet;从表格/电子表格加载节点信息; one of the columns should be ID and the rest are metadata about the nodes (nodes are people, so gender, groups... normally to be used for coloring).其中一列应该是 ID,其余的列是关于节点的元数据(节点是人,所以性别,组......通常用于着色)。 Like:喜欢:

     id;NormalizedName;Gender per1;Jesús;male per2;Abraham;male per3;Isaac;male per4;Jacob;male per5;Judá;male per6;Tamar;female ...
  2. Then load the edges also from a table/spreadsheet, using the same names for the nodes as it was in the column ID of the nodes spreadsheet with normally four columns (Target, Source, Weight and Type):然后也从表/电子表格中加载边,使用与节点电子表格的列 ID 相同的节点名称,通常有四列(目标、来源、权重和类型):

     Target;Source;Weight;Type per1;per2;3;Undirected per3;per4;2;Undirected ...

This are the two dataframes that I have and that I want to load in Python.这是我拥有的两个数据帧,我想在 Python 中加载它们。 Reading about NetworkX, it seems that it's not quite possible to load two tables (one for nodes, one for edges) into the same graph and I am not sure what would be the best way:阅读有关 NetworkX 的信息,似乎不太可能将两个表(一个用于节点,一个用于边)加载到同一个图中,我不确定最好的方法是什么:

  1. Should I create a graph only with the nodes informations from the DataFrame, and then add (append) the edges from the other DataFrame?我是否应该仅使用来自 DataFrame 的节点信息创建一个图形,然后添加(附加)来自其他 DataFrame 的边? If so and since nx.from_pandas_dataframe() expects information about the edges, I guess I shouldn't use it to create the nodes... Should I just pass the information as lists?如果是这样并且由于 nx.from_pandas_dataframe() 需要有关边缘的信息,我想我不应该使用它来创建节点......我应该将信息作为列表传递吗?

  2. Should I create a graph only with the edges information from the DataFrame and then add to each node the information from the other DataFrame as attributes?我是否应该仅使用来自 DataFrame 的边信息创建图形,然后将来自其他 DataFrame 的信息作为属性添加到每个节点? Is there a better way for doing that than iterating over the DataFrame and the nodes?有没有比迭代 DataFrame 和节点更好的方法呢?

Create the weighted graph from the edge table using nx.from_pandas_dataframe :使用nx.from_pandas_dataframe从边表创建加权图:

import networkx as nx
import pandas as pd

edges = pd.DataFrame({'source' : [0, 1],
                      'target' : [1, 2],
                      'weight' : [100, 50]})

nodes = pd.DataFrame({'node' : [0, 1, 2],
                      'name' : ['Foo', 'Bar', 'Baz'],
                      'gender' : ['M', 'F', 'M']})

G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')

Then add the node attributes from dictionaries using set_node_attributes :然后使用set_node_attributes从字典中添加节点属性:

nx.set_node_attributes(G, 'name', pd.Series(nodes.name, index=nodes.node).to_dict())
nx.set_node_attributes(G, 'gender', pd.Series(nodes.gender, index=nodes.node).to_dict())

Or iterate over the graph to add the node attributes:或者遍历图以添加节点属性:

for i in sorted(G.nodes()):
    G.node[i]['name'] = nodes.name[i]
    G.node[i]['gender'] = nodes.gender[i]

Update:更新:

As of nx 2.0 the argument order of nx.set_node_attributes has changed : (G, values, name=None)nx 2.0nx.set_node_attributes的参数顺序已更改(G, values, name=None)

Using the example from above:使用上面的例子:

nx.set_node_attributes(G, pd.Series(nodes.gender, index=nodes.node).to_dict(), 'gender')

And as of nx 2.4 , G.node[] is replaced by G.nodes[] .并作为nx 2.4G.node[] 被替换G.nodes[]

Here's basically the same answer, but updated with some details filled in. We'll start with basically the same setup, but here there won't be indices for the nodes, just names to address @LancelotHolmes comment and make it more general:这基本上是相同的答案,但更新了一些细节。我们将从基本相同的设置开始,但这里不会有节点的索引,只有名称来解决@LancelotHolmes 评论并使其更通用:

import networkx as nx
import pandas as pd

linkData = pd.DataFrame({'source' : ['Amy', 'Bob'],
                  'target' : ['Bob', 'Cindy'],
                  'weight' : [100, 50]})

nodeData = pd.DataFrame({'name' : ['Amy', 'Bob', 'Cindy'],
                  'type' : ['Foo', 'Bar', 'Baz'],
                  'gender' : ['M', 'F', 'M']})

G = nx.from_pandas_edgelist(linkData, 'source', 'target', True, nx.DiGraph())

Here the True parameter tells NetworkX to keep all the properties in the linkData as link properties.这里的True参数告诉 NetworkX 将 linkData 中的所有属性保留为链接属性。 In this case I've made it a DiGraph type, but if you don't need that, then you can make it another type in the obvious way.在这种情况下,我将其DiGraph类型,但如果您不需要它,那么您可以以明显的方式将其DiGraph另一种类型。

Now, since you need to match the nodeData by the name of the nodes generated from the linkData, you need to set the index of the nodeData dataframe to be the name property, before making it a dictionary so that NetworkX 2.x can load it as the node attributes.现在,由于您需要通过从 linkData 生成的节点的名称来匹配 nodeData,您需要将 nodeData 数据帧的索引设置为name属性,然后将其设置为字典以便 NetworkX 2.x 可以加载它作为节点属性。

nx.set_node_attributes(G, nodeData.set_index('name').to_dict('index'))

This loads the whole nodeData dataframe into a dictionary in which the key is the name, and the other properties are key:value pairs within that key (ie, normal node properties where the node index is its name).这将整个 nodeData 数据帧加载到字典中,其中键是名称,其他属性是该键内的键:值对(即,节点索引是其名称的普通节点属性)。

A small remark:一个小说明:

from_pandas_dataframe doesn't work in nx 2, referring to this one from_pandas_dataframe 在 nx 2 中不起作用,指的是这个

G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')

I think that in nx 2.0 it goes like that:我认为在 nx 2.0 中它是这样的:

G = nx.from_pandas_edgelist(edges, source = "Source", target = "Target")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM