How to convert a list of nested dataframes into a count matrix based on common values in the dataframe

Question

I have a long list of genes. I have added a toy example below.

output of dput(list1)

list(ENDOSS = structure(list(ENDOSS = c("CDKN1C", "SOX6", "TGFB2"
)), row.names = c(NA, -3L), class = "data.frame"), ENDOSSSD = structure(list(
    ENDOSSSD = c("CDKN1C", "SOX6", "TGFB2")), row.names = c(NA, 
-3L), class = "data.frame"), GASTRIN = structure(list(GASTRIN = c("IKBKB", 
"KIT", "SERPINE1")), row.names = c(NA, -3L), class = "data.frame"), 
    METCC = structure(list(METCC = character(0)), row.names = character(0), class = "data.frame"))

The toy list looks as so

list1
    ENDOSS
         "CDKN1C", "SOX6", "TGFB2" 
    ENDOSSSD
         "CDKN1C", "SOX6", "TGFB2"
    GASTRIN
          "IKBKB", "KIT", "SERPINE1"
    METCC

I would like to transform this list into a count matrix. Based on the example, the output should look like this.

             CDKN1C  IKBKB  KIT SERPINE1 SOX6   TGFB2 
    ENDOSS     1       0     0     0       1      1

    ENDOSSSD   1       0     0     0       1      1

    GASTRIN    0       1     1     1       0      0

    METCC      0       0     0     0       0      0

Any help would be appreciated. Thanks.

Answer 1

We can use mtabulate after converting the column to a vector in each of the list elements

library(qdapTools)
mtabulate(lapply(list1, unlist))
         CDKN1C IKBKB KIT SERPINE1 SOX6 TGFB2
ENDOSS        1     0   0        0    1     1
ENDOSSSD      1     0   0        0    1     1
GASTRIN       0     1   1        1    0     0
METCC         0     0   0        0    0     0

Answer 2

One approach could be to combine list of dataframe into one using bind_rows , get the data in long format so that all the values are in same column. From here, you can get it back in wide format with it's counts.

library(dplyr)
library(tidyr)

bind_rows(list1, .id = 'name') %>%
  pivot_longer(cols = -name, names_to = NULL, 
               values_drop_na = TRUE) %>%
  pivot_wider(names_from = value, values_from = value, 
              values_fn = length, values_fill = 0)

#   name     CDKN1C  SOX6 TGFB2 IKBKB   KIT SERPINE1
#  <chr>     <int> <int> <int> <int> <int>    <int>
#1 ENDOSS        1     1     1     0     0        0
#2 ENDOSSSD      1     1     1     0     0        0
#3 GASTRIN       0     0     0     1     1        1

How to convert a list of nested dataframes into a count matrix based on common values in the dataframe

Question

2 answers

solution1
2 2021-06-16 17:24:22

solution2
1 ACCPTED 2021-06-16 10:50:43

How to convert a list of nested dataframes into a count matrix based on common values in the dataframe

Question

2 answers

solution1 2 2021-06-16 17:24:22

solution2 1 ACCPTED 2021-06-16 10:50:43

solution1
2 2021-06-16 17:24:22

solution2
1 ACCPTED 2021-06-16 10:50:43