简体   繁体   中英

Coercing a Data to a Matrix for network analysis R

I'm trying to create an undirected network graph as part of a project I'm working on. The data I have is qualitative results. Order here doesn't matter. I'm trying to do this in igraph - mostly because it's what I learned several years ago, but I'm not attached necessarily to igraph.

Data looks something like this, but with 246 rows:

df <- data.frame(ResultA = c("drug1", "drug2", "drug3", "drug4"),
                 ResultB = c("drug2", "drug3", "drug4", "drug1"),
                 ResultC = c("drug4", NA, "drug3", NA),
                 ResultD = c("drug3", NA, NA, NA)) 

Importantly, I want to make sure I have connections between all four columns (colname doesn't matter either)

So for the first row that'd be:

drug1 -- drug2,  
drug1 -- drug4,
drug1 -- drug3, 
drug2 -- drug4, 
drug2 -- drug3,
drug4 -- drug4

I've been trying to get it into an adjacency/incidence matrix, but struggling

Any help here would be great - the tidyverse solution would be nice, but not necessary (because I'm working on actually learning tidyverse rather than hack & slashing my way through R)



For clarity, the above example of the output is what the igraph object would look like, not the desired output.

For those who don't do SNA here are the options:

To    From
drug1 drug2,  
drug1 drug4,
drug1 drug3, 
drug2 drug4, 
drug2 drug3,
drug4 drug4

Or an adjacency matrix (just going to do row 1&2 here; using "dr" for short)

     drug1 dr2 dr3 dr4
drug1  0   1   1   1
dr2    1   0   2   1
dr3    1   2   0   1
dr4    1   1   1   0

(I think, a bit harder to think through the adjacency matrix, eg also here: https://www.jessesadler.com/post/network-analysis-with-r/ )

I don't know of an easy/quick way to convert data in such a way to an edge list to easily calculate the adjaceny matrix. But here is a set of steps reshaping the data with tidyverse functions.

df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id) %>% 
  select(-name) %>% 
  filter(!is.na(value)) %>% 
  nest(data=(value)) %>% 
  mutate(pairs=map(data, ~as_tibble(t(combn(.$value, 2))), .name_repair=T)) %>% 
  pull(pairs) %>% 
  bind_rows() %>% 
  graph_from_data_frame(directed=FALSE) %>% 

We turn the data into a long format, then mutate it to create all pairs of drugs in each row. Then we combine all those pairs and turn that into a graph object. We then extract the adjacenty matrix from the graph object. For the sample input data, this returns

4 x 4 sparse Matrix of class "dgCMatrix"
      drug1 drug2 drug4 drug3
drug1     .     1     2     1
drug2     1     .     1     2
drug4     2     1     .     3
drug3     1     2     3     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM