繁体   English   中英

从 R 中的现有多列创建一个新列

[英]create a new column from existing multiple columns in R

有人可以指导我如何在 R 中创建新的 4 个变量吗? 我想从 R 中的以下数据创建新的 4 个变量;

data$VarApple = var1 through var6 = "apple"
data$varBerry = var1 through var6 = "berry"
data$varPear = var1 through var6 = "pear"
data$varBanana = var1 through var6 = "banana"


data = data.frame(var1 = c("apple","pear","berry","apple","pear","banana","berry"),
       var2 = c("banana","apple","berry","apple","banana","banana","berry"),
       var3 = c("berry","pear","pear","apple","berry","banana","apple"),
       var4 = c("apple","banana","apple","pear","berry","pear","berry"),
       var5 = c("banana","pear","pear","apple","apple","banana","berry"),
       var6 = c("pear","berry","apple","apple","banana","banana","apple"))

您想为data中的每个唯一值创建一个 1/0 列吗?

像这样的东西:

unique_vals <- unique(unlist(data))
cbind(data, sapply(unique_vals, function(x) +(rowSums(data == x) > 0)))


#    var1   var2   var3   var4   var5   var6 apple pear berry banana
#1  apple banana  berry  apple banana   pear     1    1     1      1
#2   pear  apple   pear banana   pear  berry     1    1     1      1
#3  berry  berry   pear  apple   pear  apple     1    1     1      0
#4  apple  apple  apple   pear  apple  apple     1    1     0      0
#5   pear banana  berry  berry  apple banana     1    1     1      1
#6 banana banana banana   pear banana banana     0    1     0      1
#7  berry  berry  apple  berry  berry  apple     1    0     1      0

我们可以使用table在未cbind的数据集上创建频率计数并与原始数据unlist

cbind(data, +(table(seq_len(nrow(data))[row(data)], unlist(data)) >0))
#      var1   var2   var3   var4   var5   var6 apple banana berry pear
#1  apple banana  berry  apple banana   pear     1      1     1    1
#2   pear  apple   pear banana   pear  berry     1      1     1    1
#3  berry  berry   pear  apple   pear  apple     1      0     1    1
#4  apple  apple  apple   pear  apple  apple     1      0     0    1
#5   pear banana  berry  berry  apple banana     1      1     1    1
#6 banana banana banana   pear banana banana     0      1     0    1
#7  berry  berry  apple  berry  berry  apple     1      0     1    0

或者使用来自mtabulateqdapTools

library(qdapTools)
cbind(data, +(mtabulate(as.data.frame(t(data))) > 0))

或上述的变体

cbind(data, +(mtabulate(asplit(data, 1)) > 0))

或带有cSplit的选项

library(tidyr)
library(splitstackshape)
data %>%
  unite(newcol, everything(), remove = FALSE) %>%
  cSplit_e('newcol', '_', mode = 'binary', type = 'character', drop = TRUE)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM