[英]Creating nodes for an undirected graph starting from pandas
I have a dataframe that looks like this (I have 170000 observations in reality):我有一个 dataframe 看起来像这样(我在现实中有 170000 个观察):
Firm pat cited_pat
F_1 [p0,p1,p2] [p0,p1,p2]
F_2 [] []
F_3 [p3,p6,p2] [p5,p0,p23,p29,p12,p8]
F_4 [p0,p9,p25] [p0,p29,p31]
...
The idea is this:这个想法是这样的:
cited_pat
and check how many "ps" are in common there.cited_pat
并检查那里共有多少个“ps”。 If more than 50% are in common than create an edge=1. Now, I am struggling a lot finding aa way to do it in an easy way.现在,我正在努力寻找一种简单的方法来做到这一点。 Could you please help me on this?
你能帮我解决这个问题吗?
Here's one way to do things:这是一种做事的方法:
import pandas as pd
import numpy as np
import networkx as nx
data = {'Firm': {0: 'F_1', 1: 'F_2', 2: 'F_3', 3: 'F_4'},
'pat': {0: ['p0','p1','p2'], 1: [], 2: ['p3','p6','p2'], 3: ['p0','p9','p25']},
'cited_pat': {0: ['p0','p1','p2'],
1: [],
2: ['p5','p0','p23','p29','p12','p8'],
3: ['p0','p29','p31']}}
df = pd.DataFrame(data)
def cited_pat_func(set_i):
def f(set_j):
return len(set_i & set_j)*2 >= len(set_i | set_j)
return f
G = nx.Graph()
G.add_nodes_from(df['Firm'])
for i,row in df.iterrows():
df_tail = df.iloc[(i+1):,:]
F_i = row['Firm']
pat_i = set(row['pat'])
cpat_i = set(row['cited_pat'])
cond = (df_tail['pat'].apply(set)
.apply(pat_i.intersection)
.astype(bool) |
df_tail['cited_pat'].apply(set)
.apply(cited_pat_func(cpat_i)))
for F_j in df_tail.loc[cond,'Firm']:
G.add_edge(F_i, F_j)
Here's the graph produced for this example:这是为此示例生成的图表:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.