简体   繁体   中英

Convert 2 columns to adjacency matrix

I am quite new to R and to this forum, please help me.

I have data with 2 columns

  • 1st column with id of classes
  • 2nd column with id of students

I would like to convert it to an edge list where

  • 1st column with id of classes
  • 2nd column with id of classes
  • 3rd column with value of the number of students that 2 classes share

And I also want to create an adjacency matrix (rows and columns are the classes)

I searched on this forum and tried to use full_join, but the classes without sharing any common students are excluded out, instead of keeping them and putting the value 0.

Can anyone help me with codes to find the edge list with all the pairwise of 2 classes? And help me with the adjacency matrix as well for all classes (without removing the isolated classes out)?

Thank you so much.

data <- full_join(mydata, mydata, c('studentid' = 'studentid')) %>% 
select(-studentid) %>% 
filter(classid.x != classid.y) %>% 
group_by(classid.x, classid.y) %>% summarise(weight = n()) 
mydata <- read.table(header=TRUE, text="
  classid   studentid
0036110 03576311
0036110 08195612
0036110 20302811
0036110 29681210
0036110 03484975
0036110 03484815
0036110 04583310
0036110 06919310
0036110 03576211
0088630 10249511
0088630 00662458
0088630 00419766
0088630 10248511
0088630 10247911
0088630 10250611
0088630 00426947
0088630 00105669
0088630 10100910
0088630 00781739
0095710 02255772
0095710 02255742
0095710 02255782
0095710 02255682
0095710 02255752
0095710 04625310
0095710 02255722
0095710 02255692
0108410 01587447
0108410 10248511
0108410 00730873
0108410 01587497
0108410 00051469
0108410 01587397
0108410 01587587
0108410 00310447
0108410 01587457
0154710 20302811
0154710 01068245
0154710 00409605
0154710 02309283
0154710 00635705
0154710 03721112
0154710 02434835
0154710 00409755
0154710 00657098
0154710 02309263
0176510 03679107
0176510 00303516
0176510 00435928
0176510 00188526
0176510 00450059
0176510 00430397
0176510 01595488
0176510 10248511
0176510 07911110
0176510 00417916
0176510 00341139
0176510 00327468
0176510 00418006
0191U10 04778988
0191U10 04780648
0191U10 04780798
0191U10 04844509
0191U10 04780938
0195750 00305336
0195750 40866711
0195750 00625644
0206R10 04605910
0206R10 00010502
0206R10 00421056
0206R10 00421066
0206R10 00420986
0206R10 00421006
0206R10 00220119
0206R10 00420816
0206R10 00440028
0206R10 00416026
0206R10 00043863
0206R10 00625754
0206R10 00403354
0206R10 00431227
0206R10 00403314
0206R10 00412295
0206R10 04604810
0206R10 21078752
0206R10 00420926
0206R10 04608910
")

full_join correctly retains all class ids, but you're dropping some through filter(classid.x != classid.y) . This is effectively the diagonal of your adjacency matrix; if you want zeroes on the diagonal, the better approach is to explicitly set them as such. The off-diagonal entries can then be filled through tidyr::spread :

X <- full_join(mydata, mydata, c('studentid' = 'studentid')) %>%
  group_by(classid.x, classid.y) %>% summarise(weight = n()) %>%
  mutate(weight = replace(weight, classid.x==classid.y, 0)) %>%
  ungroup() %>% tidyr::spread(classid.y, weight, fill=0)
# # A tibble: 9 x 10
#   classid.x `0036110` `0088630` `0095710` `0108410` `0154710` `0176510`
#   <fct>         <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
# 1 0036110           0         0         0         0         1         0
# 2 0088630           0         0         0         1         0         1
# 3 0095710           0         0         0         0         0         0
# 4 0108410           0         1         0         0         0         1
# 5 0154710           1         0         0         0         0         0
# 6 0176510           0         1         0         1         0         0
# 7 0191U10           0         0         0         0         0         0
# 8 0195750           0         0         0         0         0         0
# 9 0206R10           0         0         0         0         0         0
# # … with 3 more variables: `0191U10` <dbl>, `0195750` <dbl>, `0206R10` <dbl>

From here, you can either make it into a proper matrix

X %>% as.data.frame %>% tibble::column_to_rownames( "classid.x" ) %>% as.matrix

or convert it back to a data frame in long format

X %>% tidyr::gather( classid.y, weight, -classid.x )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM