简体   繁体   中英

How would I create an adjacency matrix in python filled with 0 or 1 boolean values?

Say I have this sample dataframe:

df_sample = pd.DataFrame({'name':['ben','chris','todd','mike','steven'],'sport':['football','baseball','baseball','football','football']})

在此处输入图片说明

I want to create an adjacency matrix so that the index and the column values are the names, and the cell values are 1 if the two players play the same sport, and 0 if the two players play a different sport.

What is a good way to do this?

Thanks so much.

You can do a self join on 'sport' and then use crosstab :

merged = df_sample.merge(df_sample, on='sport')
print(pd.crosstab(merged['name_x'], merged['name_y']))

Output:

name_y  ben  chris  mike  steven  todd
name_x                                
ben       1      0     1       1     0
chris     0      1     0       0     1
mike      1      0     1       1     0
steven    1      0     1       1     0
todd      0      1     0       0     1

So the way i think about this problem as a SUM of adjacency matrices that each corresponds to another sport. I use indicators to facilitate the computation.

import pandas as pd
import numpy as np

df_sample = pd.DataFrame({'name':['ben','chris','todd','mike','steven'],'sport':['football','baseball','baseball','football','football']})

df = df_sample

df.loc[(df.sport == 'football'), 'f_ind'] = 1
df.loc[(df.sport == 'baseball'), 'b_ind'] = 1
df = df.fillna(0)
print(df)

     name     sport  f_ind  b_ind
0     ben  football    1.0    0.0
1   chris  baseball    0.0    1.0
2    todd  baseball    0.0    1.0
3    mike  football    1.0    0.0
4  steven  football    1.0    0.0

f_vec = np.array(df.f_ind).reshape((1, -1))
print(f_vec.T * f_vec)

[[1. 0. 0. 1. 1.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [1. 0. 0. 1. 1.]
 [1. 0. 0. 1. 1.]]

As you can see, now we have the adjacency matrix for football, if you repeat this process for baseball and add those two matrices up you would end up with:

[[1. 0. 0. 1. 1.]
 [0. 1. 1. 0. 0.]
 [0. 1. 1. 0. 0.]
 [1. 0. 0. 1. 1.]
 [1. 0. 0. 1. 1.]]

Which is exactly the matrix you are looking for.

Keep in mind that this process will only work if each person can play only one sport.

PS - if you dont want to have edges from a person to themselves you can deduct the identity matrix from the result above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM