Say I have this sample dataframe:
df_sample = pd.DataFrame({'name':['ben','chris','todd','mike','steven'],'sport':['football','baseball','baseball','football','football']})
I want to create an adjacency matrix so that the index and the column values are the names, and the cell values are 1 if the two players play the same sport, and 0 if the two players play a different sport.
What is a good way to do this?
Thanks so much.
You can do a self join on 'sport'
and then use crosstab
:
merged = df_sample.merge(df_sample, on='sport')
print(pd.crosstab(merged['name_x'], merged['name_y']))
Output:
name_y ben chris mike steven todd
name_x
ben 1 0 1 1 0
chris 0 1 0 0 1
mike 1 0 1 1 0
steven 1 0 1 1 0
todd 0 1 0 0 1
So the way i think about this problem as a SUM of adjacency matrices that each corresponds to another sport. I use indicators to facilitate the computation.
import pandas as pd
import numpy as np
df_sample = pd.DataFrame({'name':['ben','chris','todd','mike','steven'],'sport':['football','baseball','baseball','football','football']})
df = df_sample
df.loc[(df.sport == 'football'), 'f_ind'] = 1
df.loc[(df.sport == 'baseball'), 'b_ind'] = 1
df = df.fillna(0)
print(df)
name sport f_ind b_ind
0 ben football 1.0 0.0
1 chris baseball 0.0 1.0
2 todd baseball 0.0 1.0
3 mike football 1.0 0.0
4 steven football 1.0 0.0
f_vec = np.array(df.f_ind).reshape((1, -1))
print(f_vec.T * f_vec)
[[1. 0. 0. 1. 1.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[1. 0. 0. 1. 1.]
[1. 0. 0. 1. 1.]]
As you can see, now we have the adjacency matrix for football, if you repeat this process for baseball
and add those two matrices up you would end up with:
[[1. 0. 0. 1. 1.]
[0. 1. 1. 0. 0.]
[0. 1. 1. 0. 0.]
[1. 0. 0. 1. 1.]
[1. 0. 0. 1. 1.]]
Which is exactly the matrix you are looking for.
Keep in mind that this process will only work if each person can play only one sport.
PS - if you dont want to have edges from a person to themselves you can deduct the identity matrix from the result above.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.