[英]Pythonically create adjacency matrix from spreadsheet
I have a spreadsheet with lists of names of people that a particular person reported working with on a number of projects.我有一个电子表格,其中包含特定人员报告的在多个项目中与之合作的人员姓名列表。 If I import it to pandas as a dataframe it will look like this:如果我将其作为 dataframe 导入到 pandas,它将如下所示:
1 2
Jane ['Fred', 'Joe'] ['Joe', 'Fred', 'Bob']
Fred ['Alex'] ['Jane']
Terry NaN ['Bob']
Bob ['Joe'] ['Jane', 'Terry']
Alex ['Fred'] NaN
Joe ['Jane'] ['Jane']
I want to create an adjacency matrix that will look like this:我想创建一个如下所示的邻接矩阵:
Jane Fred Terry Bob Alex Joe
Jane 0 2 0 1 0 2
Fred 1 0 0 0 1 0
Terry 0 0 0 1 0 0
Bob 1 0 1 0 0 1
Alex 0 1 0 0 0 0
Joe 2 0 0 0 0 0
This matrix, generally, will NOT be symmetric because of inconsistency with people's reports.由于与人们的报告不一致,该矩阵通常不会对称。 I have been creating the adjacency matrix just by looping through the dataframe and incrementing the the matrix elements accordingly.我一直在通过循环 dataframe 并相应地增加矩阵元素来创建邻接矩阵。 Apparently, looping through dataframes is NOT recommended and inefficient, so does anyone have a suggestion on how his could be done more pythonically?显然,不推荐循环遍历数据帧并且效率低下,所以有没有人建议如何更 pythonically 地完成他的工作?
This is the sample of the data I used to work with.这是我曾经使用过的数据样本。
df = pd.DataFrame({
'Name': ['Jane', 'Fred', 'Terry', 'Bob', 'Alex', 'Joe'],
'1':[['Fred', 'Joe'], ['Alex'], np.nan,['Joe'], ['Fred'], ['Jane']],
'2': [['Joe', 'Fred', 'Bob'], ['Jane'], ['Bob'], ['Jane', 'Terry'], np.nan, ['Jane']]
})
df.head()
Name 1 2
0 Jane [Fred, Joe] [Joe, Fred, Bob]
1 Fred [Alex] [Jane]
2 Terry NaN [Bob]
3 Bob [Joe] [Jane, Terry]
4 Alex [Fred] NaN
I created the adjacency matrix using pandas in three simple steps.我通过三个简单的步骤使用 pandas 创建了邻接矩阵。
First, I melted the data to have one column only for all the connections between the different names and dropped the variable column.首先,我将数据融合为只有一列用于不同名称之间的所有连接,并删除了变量列。
dff = df.melt(id_vars=['Name']).drop('variable', axis=1)
Name value
0 Jane [Fred, Joe]
1 Fred [Alex]
2 Terry NaN
3 Bob [Joe]
4 Alex [Fred]
5 Joe [Jane]
6 Jane [Joe, Fred, Bob]
7 Fred [Jane]
8 Terry [Bob]
9 Bob [Jane, Terry]
10 Alex NaN
11 Joe [Jane]
Secondly, I used the explode method to break down the rows with lists in separate rows.其次,我使用 explode 方法将行分解为单独的行中的列表。
dff = dff.explode('value')
Name value
0 Jane Fred
0 Jane Joe
1 Fred Alex
2 Terry NaN
3 Bob Joe
4 Alex Fred
5 Joe Jane
6 Jane Joe
6 Jane Fred
6 Jane Bob
7 Fred Jane
8 Terry Bob
9 Bob Jane
9 Bob Terry
10 Alex NaN
11 Joe Jane
Finally, to create the adjacency matrix I used crosstab within pandas which counts the occurrences in the two columns specified only.最后,为了创建邻接矩阵,我在 pandas 中使用了交叉表,它仅计算指定的两列中的出现次数。
pd.crosstab(dff['Name'], dff['value'])
value Alex Bob Fred Jane Joe Terry
Name
Alex 0 0 1 0 0 0
Bob 0 0 0 1 1 1
Fred 1 0 0 1 0 0
Jane 0 1 2 0 2 0
Joe 0 0 0 2 0 0
Terry 0 1 0 0 0 0
Here is one approach:这是一种方法:
import pandas as pd
import ast
data = ''' 1 2
Jane ['Fred', 'Joe'] ['Joe', 'Fred', 'Bob']
Fred ['Alex'] ['Jane']
Terry NaN ['Bob']
Bob ['Joe'] ['Jane', 'Terry']
Alex ['Fred'] NaN
Joe ['Jane'] ['Jane']'''
df = pd.read_csv(io.StringIO(data), sep='\s\s+', engine='python').fillna('[]').applymap(ast.literal_eval) #if your columns are already lists rather than string representations, use .fillna([]) and skip the applymap
df['all'] = df['1']+df['2'] #merge lists of columns 1 and 2
df_edges = df[['all']].explode('all').reset_index() #create new df by exploding the combined list
df_edges = df_edges.groupby(['index', 'all'])['all'].count().reset_index(name="count") #groupby and count the pairs
df_edges.pivot(index='index', columns='all', values='count').fillna(0) #create adjacency matrix with pivot
Output: Output:
index指数 | Alex亚历克斯 | Bob鲍勃 | Fred弗雷德 | Jane简 | Joe乔 | Terry特里 |
---|---|---|---|---|---|---|
Alex亚历克斯 | 0 0 | 0 0 | 1 1个 | 0 0 | 0 0 | 0 0 |
Bob鲍勃 | 0 0 | 0 0 | 0 0 | 1 1个 | 1 1个 | 1 1个 |
Fred弗雷德 | 1 1个 | 0 0 | 0 0 | 1 1个 | 0 0 | 0 0 |
Jane简 | 0 0 | 1 1个 | 2 2个 | 0 0 | 2 2个 | 0 0 |
Joe乔 | 0 0 | 0 0 | 0 0 | 2 2个 | 0 0 | 0 0 |
Terry特里 | 0 0 | 1 1个 | 0 0 | 0 0 | 0 0 | 0 0 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.