简体   繁体   中英

Create a logical or binary matrix/data.frame from a list of factors in R

I have a list of approximately 2 million elements. The list is made up of vectors of character strings. There are about 50 different character strings so can be considered factors. The vectors of character strings are different lengths varying between 1 and 50 (ie the total number of character strings).

I want to convert the list to a logical or binary matrix/data.frame. Currently my method involves lapply and is incredibly slow, I would like to know if there is a vectorised approach.

require(dplyr); require(tidyr)
#create test data set
set.seed(123)
list1 <- list()
ListLength <-10
elementlength <- sample(1:5, ListLength, replace = TRUE )

for(i in 1:length(elementlength) ){
  list1[[i]] <- sample(letters[1:15], elementlength[i])
}

#Create data frame from list using lapply
lapply(list1, function(n){
  data.frame(type = n, value = TRUE) %>% 
    spread(., key = type, value )
}) %>% bind_rows()

I don't know if there is a way by preallocating the data frame then filling it in somehow.

Type <- unique(unlist(list1, use.names = FALSE))

#Create empty dataframe  
TypeMat <- data.frame(matrix(NA, 
                               ncol = length(Type), 
                               nrow = ListLength)) %>% 
  setNames(Type)

We could use mtabulate from qdapTools

library(qdapTools)
mtabulate(list1)!=0
#     a     b     c     d     e     f     g     h     i     j     k     l     m     o
#[1,] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
#[2,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE
#[3,]  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
#[5,] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE
#[6,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE
#[8,]  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE
#[9,] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[10,]FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM