I have a long list of genes. I have added a toy example below.
output of dput(list1)
list(ENDOSS = structure(list(ENDOSS = c("CDKN1C", "SOX6", "TGFB2"
)), row.names = c(NA, -3L), class = "data.frame"), ENDOSSSD = structure(list(
ENDOSSSD = c("CDKN1C", "SOX6", "TGFB2")), row.names = c(NA,
-3L), class = "data.frame"), GASTRIN = structure(list(GASTRIN = c("IKBKB",
"KIT", "SERPINE1")), row.names = c(NA, -3L), class = "data.frame"),
METCC = structure(list(METCC = character(0)), row.names = character(0), class = "data.frame"))
The toy list looks as so
list1
ENDOSS
"CDKN1C", "SOX6", "TGFB2"
ENDOSSSD
"CDKN1C", "SOX6", "TGFB2"
GASTRIN
"IKBKB", "KIT", "SERPINE1"
METCC
I would like to transform this list into a count matrix. Based on the example, the output should look like this.
CDKN1C IKBKB KIT SERPINE1 SOX6 TGFB2
ENDOSS 1 0 0 0 1 1
ENDOSSSD 1 0 0 0 1 1
GASTRIN 0 1 1 1 0 0
METCC 0 0 0 0 0 0
Any help would be appreciated. Thanks.
We can use mtabulate
after converting the column to a vector in each of the list
elements
library(qdapTools)
mtabulate(lapply(list1, unlist))
CDKN1C IKBKB KIT SERPINE1 SOX6 TGFB2
ENDOSS 1 0 0 0 1 1
ENDOSSSD 1 0 0 0 1 1
GASTRIN 0 1 1 1 0 0
METCC 0 0 0 0 0 0
One approach could be to combine list of dataframe into one using bind_rows
, get the data in long format so that all the values are in same column. From here, you can get it back in wide format with it's counts.
library(dplyr)
library(tidyr)
bind_rows(list1, .id = 'name') %>%
pivot_longer(cols = -name, names_to = NULL,
values_drop_na = TRUE) %>%
pivot_wider(names_from = value, values_from = value,
values_fn = length, values_fill = 0)
# name CDKN1C SOX6 TGFB2 IKBKB KIT SERPINE1
# <chr> <int> <int> <int> <int> <int> <int>
#1 ENDOSS 1 1 1 0 0 0
#2 ENDOSSSD 1 1 1 0 0 0
#3 GASTRIN 0 0 0 1 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.