简体   繁体   English

从R中的因子列表创建逻辑或二进制矩阵/数据框架

[英]Create a logical or binary matrix/data.frame from a list of factors in R

I have a list of approximately 2 million elements. 我有大约200万个元素的列表。 The list is made up of vectors of character strings. 该列表由字符串向量组成。 There are about 50 different character strings so can be considered factors. 大约有50种不同的字符串,因此可以认为是因素。 The vectors of character strings are different lengths varying between 1 and 50 (ie the total number of character strings). 字符串向量的长度不同,介于1到50之间(即,字符串的总数)。

I want to convert the list to a logical or binary matrix/data.frame. 我想将列表转换为逻辑或二进制matrix / data.frame。 Currently my method involves lapply and is incredibly slow, I would like to know if there is a vectorised approach. 目前,我的方法涉及到lapply且非常慢,我想知道是否存在矢量化方法。

require(dplyr); require(tidyr)
#create test data set
set.seed(123)
list1 <- list()
ListLength <-10
elementlength <- sample(1:5, ListLength, replace = TRUE )

for(i in 1:length(elementlength) ){
  list1[[i]] <- sample(letters[1:15], elementlength[i])
}

#Create data frame from list using lapply
lapply(list1, function(n){
  data.frame(type = n, value = TRUE) %>% 
    spread(., key = type, value )
}) %>% bind_rows()

I don't know if there is a way by preallocating the data frame then filling it in somehow. 我不知道是否有办法通过预分配数据帧然后以某种方式填充它。

Type <- unique(unlist(list1, use.names = FALSE))

#Create empty dataframe  
TypeMat <- data.frame(matrix(NA, 
                               ncol = length(Type), 
                               nrow = ListLength)) %>% 
  setNames(Type)

We could use mtabulate from qdapTools 我们可以使用mtabulateqdapTools

library(qdapTools)
mtabulate(list1)!=0
#     a     b     c     d     e     f     g     h     i     j     k     l     m     o
#[1,] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
#[2,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE
#[3,]  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
#[5,] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE
#[6,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE
#[8,]  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE
#[9,] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[10,]FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM