简体   繁体   English

将 dataframe 数组列拆分为多个二进制列 [R]

[英]Split dataframe array column into multiple binary columns [R]

数组列是当前的,其他列是目标

Array column is current and the others are the goal数组列是当前的,其他列是目标

I have a column of arrays and I would like to split it out into multiple binaries.我有一列 arrays ,我想将其拆分为多个二进制文件。 I have created all the columns by using我已经使用创建了所有列

dat[,unique(unlist(df$array_column))] = 0

I tried to use an ifelse statement to then set the columns to '1' as needed however using %in% does not work with ifelse .我尝试使用ifelse语句然后根据需要将列设置为 '1' 但是使用%in%不适用于ifelse I could create a nested for loop however I have millions of rows and am looking for a faster solution than that.我可以创建一个嵌套的 for 循环,但是我有数百万行并且正在寻找比这更快的解决方案。

testdf = data.frame('a'=c(1,2,3,4,5),'array_column'=c('a-b-c','b-a','c-d','d-e-e','e-a'),stringsAsFactors = F)
testdf$array_column = strsplit(testdf$array_column,'-')

I think the question is rather how convert a list of vectors into a binary matrix/data.frame我认为问题在于如何将向量列表转换为二进制矩阵/data.frame

Here is a solution这是一个解决方案

testdf = data.frame('a'=c(1,2,3,4,5),'array_column'=c('a-b-c','b-a','c-d','d-e-e','e-a'),stringsAsFactors = F)                     
testdf$array_column = strsplit(testdf$array_column,'-')    

library('plyr')                                                                                                

# Creates a list of data.frames with 1s for each value observed                                                                   
binary <- lapply(testdf$array_column, function(x) {                                                                                                                                                   
                     vals <- unique(x) 
                     x <- setNames(rep(1,length(vals)), vals);                                                                      
                     do.call(data.frame, as.list(x))                                                                                
                })                                                                                                                  

# Joins into single data.frame                                                                                                                
result <- do.call(rbind.fill, binary)                                                                                                
result[is.na(result)] <- 0                                                                                                          

result                                                                                                                              
#   a b c d e                                                                                                                       
# 1 1 1 1 0 0                                                                                                                       
# 2 1 1 0 0 0                                                                                                                       
# 3 0 0 1 1 0                                                                                                                       
# 4 0 0 0 1 1                                                                                                                       
# 5 1 0 0 0 1  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM