Construct bipartite graph from columns of python dataframe

Question

I have a dataframe with three columns.

data['subdomain'],  data['domain'], data ['IP']

I want to build one bipartite graph for every element of subdomain that corresponds to the same domain, and the weight to be the number of times that it corresponds.

For example my data could be:

subdomain , domain, IP
test1, example.org, 10.20.30.40
something, site.com, 30.50.70.90
test2, example.org, 10.20.30.41
test3, example.org, 10.20.30.42
else, website.com, 90.80.70.10

I want a bipartite graph stating that example.org has a weight of 3 as it has 3 edges on it etc. And I want to group these results together into a new dataframe.

I have been trying with networkX but I have no experience especially when the edges need to be computed.

B=nx.Graph()
B.add_nodes_from(data['subdomain'],bipartite=0)
B.add_nodes_from(data['domain'],bipartite=1)
B.add_edges_from (...)

Answer 1

You could use

B.add_weighted_edges_from(
    [(row['domain'], row['subdomain'], 1) for idx, row in df.iterrows()], 
    weight='weight')

to add weighted edges, or you could use

B.add_edges_from(
    [(row['domain'], row['subdomain']) for idx, row in df.iterrows()])

to add edges without weights.

You may not need weights since the node degree is the number of edges adjacent to that node. For example,

>>> B.degree('example.org')
3

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

df = pd.DataFrame(
    {'IP': ['10.20.30.40',
      '30.50.70.90',
      '10.20.30.41',
      '10.20.30.42',
      '90.80.70.10'],
     'domain': ['example.org',
      'site.com',
      'example.org',
      'example.org',
      'website.com'],
     'subdomain': ['test1', 'something', 'test2', 'test3', 'else']})

B = nx.Graph()
B.add_nodes_from(df['subdomain'], bipartite=0)
B.add_nodes_from(df['domain'], bipartite=1)
B.add_weighted_edges_from(
    [(row['domain'], row['subdomain'], 1) for idx, row in df.iterrows()], 
    weight='weight')

print(B.edges(data=True))
# [('test1', 'example.org', {'weight': 1}), ('test3', 'example.org', {'weight': 1}), ('test2', 'example.org', {'weight': 1}), ('website.com', 'else', {'weight': 1}), ('site.com', 'something', {'weight': 1})]

pos = {node:[0, i] for i,node in enumerate(df['domain'])}
pos.update({node:[1, i] for i,node in enumerate(df['subdomain'])})
nx.draw(B, pos, with_labels=False)
for p in pos:  # raise text positions
    pos[p][1] += 0.25
nx.draw_networkx_labels(B, pos)

plt.show()

yields 在此输入图像描述

Construct bipartite graph from columns of python dataframe

Question

1 answers

solution1
7 ACCPTED 2015-06-15 18:06:27

Construct bipartite graph from columns of python dataframe

Question

1 answers

solution1 7 ACCPTED 2015-06-15 18:06:27

solution1
7 ACCPTED 2015-06-15 18:06:27