簡體   English   中英

R:如何“匯總”(或合並)字符列?

[英]R: How to 'aggregate' (or combine) character columns?

我有一個帶有三列的df。 每列都有一個字符或NA,每行只有一個字符。 作為此示例:

df <- data.frame(a=c("NA","NA","NA","NA","fruits","fruits","fruits","fruits","fruits","fruits"), 
                 b=c("NA","NA","veggies","veggies","NA","NA","NA","NA","NA","NA"),
                 c=c("nuts","nuts","NA","NA","NA","NA","NA","NA","NA","NA") )

我想結合所有三列,以得到此:

1     nuts
2     nuts
3  veggies
4  veggies
5   fruits
6   fruits
7   fruits
8   fruits
9   fruits
10  fruits

使用數字值時,我將使用na.rm=TRUE aggregate 但是,我不知道該如何處理角色。 有想法嗎? 謝謝

將字符串“ NA”轉換為實數NA后,可以使用max.col 我們使用max.col獲得行/列索引,提取值,然后將其轉換為data.frame

is.na(df) <- df=='NA'
data.frame(var=df[cbind(1:nrow(df),max.col(!is.na(df)))])
#      var
#1     nuts
#2     nuts
#3  veggies
#4  veggies
#5   fruits
#6   fruits
#7   fruits
#8   fruits
#9   fruits
#10  fruits

否則另一個選擇是

data.frame(var= df[cbind(1:nrow(df),(+!is.na(df)) %*% seq_along(df))])

要完善注釋中提供的想法,您可以執行以下操作:

data.frame(var = apply(df, 1, function(x) paste(gsub("NA", "", x), collapse = "")) )

      var
1     nuts
2     nuts
3  veggies
4  veggies
5   fruits
6   fruits
7   fruits
8   fruits
9   fruits
10  fruits

實際數據情況可能確定是比逐行方法更好還是更壞。 這是一種獲得指定打印輸出的方法:

> as.matrix( df[df!="NA"] )

也許更好:

> cat( paste( "\n", df[ df!="NA" ] ) )

 fruits 
 fruits 
 fruits 
 fruits 
 fruits 
 fruits 
 veggies 
 veggies 
 nuts 
 nuts 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM