简体   繁体   中英

Coercing a Data to a Matrix for network analysis R

I'm trying to create an undirected network graph as part of a project I'm working on. The data I have is qualitative results. Order here doesn't matter. I'm trying to do this in igraph - mostly because it's what I learned several years ago, but I'm not attached necessarily to igraph.

Data looks something like this, but with 246 rows:

df <- data.frame(ResultA = c("drug1", "drug2", "drug3", "drug4"),
                 ResultB = c("drug2", "drug3", "drug4", "drug1"),
                 ResultC = c("drug4", NA, "drug3", NA),
                 ResultD = c("drug3", NA, NA, NA)) 

Importantly, I want to make sure I have connections between all four columns (colname doesn't matter either)

So for the first row that'd be:

drug1 -- drug2,  
drug1 -- drug4,
drug1 -- drug3, 
drug2 -- drug4, 
drug2 -- drug3,
drug4 -- drug4

I've been trying to get it into an adjacency/incidence matrix, but struggling

Any help here would be great - the tidyverse solution would be nice, but not necessary (because I'm working on actually learning tidyverse rather than hack & slashing my way through R)

Thanks!

Edit:

For clarity, the above example of the output is what the igraph object would look like, not the desired output.

For those who don't do SNA here are the options:

To    From
drug1 drug2,  
drug1 drug4,
drug1 drug3, 
drug2 drug4, 
drug2 drug3,
drug4 drug4

Or an adjacency matrix (just going to do row 1&2 here; using "dr" for short)

     drug1 dr2 dr3 dr4
drug1  0   1   1   1
dr2    1   0   2   1
dr3    1   2   0   1
dr4    1   1   1   0

(I think, a bit harder to think through the adjacency matrix, eg also here: https://www.jessesadler.com/post/network-analysis-with-r/ )

I don't know of an easy/quick way to convert data in such a way to an edge list to easily calculate the adjaceny matrix. But here is a set of steps reshaping the data with tidyverse functions.

library(dplyr)
library(tidyr)
library(igraph)
df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id) %>% 
  select(-name) %>% 
  filter(!is.na(value)) %>% 
  nest(data=(value)) %>% 
  mutate(pairs=map(data, ~as_tibble(t(combn(.$value, 2))), .name_repair=T)) %>% 
  pull(pairs) %>% 
  bind_rows() %>% 
  graph_from_data_frame(directed=FALSE) %>% 
  as_adjacency_matrix()

We turn the data into a long format, then mutate it to create all pairs of drugs in each row. Then we combine all those pairs and turn that into a graph object. We then extract the adjacenty matrix from the graph object. For the sample input data, this returns

4 x 4 sparse Matrix of class "dgCMatrix"
      drug1 drug2 drug4 drug3
drug1     .     1     2     1
drug2     1     .     1     2
drug4     2     1     .     3
drug3     1     2     3     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM