繁体   English   中英

R:如何“汇总”(或合并)字符列?

[英]R: How to 'aggregate' (or combine) character columns?

我有一个带有三列的df。 每列都有一个字符或NA,每行只有一个字符。 作为此示例:

df <- data.frame(a=c("NA","NA","NA","NA","fruits","fruits","fruits","fruits","fruits","fruits"), 
                 b=c("NA","NA","veggies","veggies","NA","NA","NA","NA","NA","NA"),
                 c=c("nuts","nuts","NA","NA","NA","NA","NA","NA","NA","NA") )

我想结合所有三列,以得到此:

1     nuts
2     nuts
3  veggies
4  veggies
5   fruits
6   fruits
7   fruits
8   fruits
9   fruits
10  fruits

使用数字值时,我将使用na.rm=TRUE aggregate 但是,我不知道该如何处理角色。 有想法吗? 谢谢

将字符串“ NA”转换为实数NA后,可以使用max.col 我们使用max.col获得行/列索引,提取值,然后将其转换为data.frame

is.na(df) <- df=='NA'
data.frame(var=df[cbind(1:nrow(df),max.col(!is.na(df)))])
#      var
#1     nuts
#2     nuts
#3  veggies
#4  veggies
#5   fruits
#6   fruits
#7   fruits
#8   fruits
#9   fruits
#10  fruits

否则另一个选择是

data.frame(var= df[cbind(1:nrow(df),(+!is.na(df)) %*% seq_along(df))])

要完善注释中提供的想法,您可以执行以下操作:

data.frame(var = apply(df, 1, function(x) paste(gsub("NA", "", x), collapse = "")) )

      var
1     nuts
2     nuts
3  veggies
4  veggies
5   fruits
6   fruits
7   fruits
8   fruits
9   fruits
10  fruits

实际数据情况可能确定是比逐行方法更好还是更坏。 这是一种获得指定打印输出的方法:

> as.matrix( df[df!="NA"] )

也许更好:

> cat( paste( "\n", df[ df!="NA" ] ) )

 fruits 
 fruits 
 fruits 
 fruits 
 fruits 
 fruits 
 veggies 
 veggies 
 nuts 
 nuts 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM