I have a list of approximately 2 million elements. The list is made up of vectors of character strings. There are about 50 different character strings so can be considered factors. The vectors of character strings are different lengths varying between 1 and 50 (ie the total number of character strings).
I want to convert the list to a logical or binary matrix/data.frame. Currently my method involves lapply and is incredibly slow, I would like to know if there is a vectorised approach.
require(dplyr); require(tidyr)
#create test data set
set.seed(123)
list1 <- list()
ListLength <-10
elementlength <- sample(1:5, ListLength, replace = TRUE )
for(i in 1:length(elementlength) ){
list1[[i]] <- sample(letters[1:15], elementlength[i])
}
#Create data frame from list using lapply
lapply(list1, function(n){
data.frame(type = n, value = TRUE) %>%
spread(., key = type, value )
}) %>% bind_rows()
I don't know if there is a way by preallocating the data frame then filling it in somehow.
Type <- unique(unlist(list1, use.names = FALSE))
#Create empty dataframe
TypeMat <- data.frame(matrix(NA,
ncol = length(Type),
nrow = ListLength)) %>%
setNames(Type)
We could use mtabulate
from qdapTools
library(qdapTools)
mtabulate(list1)!=0
# a b c d e f g h i j k l m o
#[1,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
#[2,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
#[3,] TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
#[5,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
#[6,] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
#[8,] TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
#[9,] FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[10,]FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.