簡體   English   中英

r將來自多個列的數據折疊為一個

[英]r collapsing data from multiple columns into one

我知道關於這個主題有很多問題,所以如果這是一個重復的問題我會道歉。 我正在嘗試將數據集中的多個列折疊為一列:

假設這是我正在使用的數據集的結構,

df <- data.frame(
      cbind(
      variable_1 = c('Var1', NA, NA,'Var1'),
      variable_2 = c('Var2', 'No', NA, NA),
      variable_3 = c(NA, NA, 'Var3', NA),
      variable_4 = c(NA, 'Var4', NA, NA),
      variable_5 = c(NA, 'No', 'Var5', NA),
      variable_6 = c(NA, NA, 'Var6', NA)

    ))

 variable_1  variable_2  variable_3  variable_4  variable_5  variable_6 
 Var1        Var2        NA          NA          NA          NA
 NA          No          NA          Var4        No          NA
 NA          NA          Var3        NA          Var5        Var6
 Var1        NA          NA          NA          NA          NA

我期待的是像這樣的一列variable_7

 variable_1  variable_2  variable_3  variable_4  variable_5  variable_6  variable_7
 Var1        Var2        NA          NA          NA          NA          Var1, Var2
 NA          No          NA          Var4        No          NA          Var4
 NA          NA          Var3        NA          Var5        Var6        Var3, Var5, Var6
 Var1        NA          NA          NA          NA          NA          Var1

任何幫助實現這一點非常感謝。

df$variable_7 <- apply(df, 1, function(x) paste(x[!is.na(x) & x != "No"], collapse = ", "));
df;
#  variable_1 variable_2 variable_3 variable_4 variable_5 variable_6
#1       Var1       Var2       <NA>       <NA>       <NA>       <NA>
#2       <NA>         No       <NA>       Var4         No       <NA>
#3       <NA>       <NA>       Var3       <NA>       Var5       Var6
#4       Var1       <NA>       <NA>       <NA>       <NA>       <NA>
#        variable_7
#1       Var1, Var2
#2             Var4
#3 Var3, Var5, Var6
#4             Var1

說明:使用applypaste(..., collapse = ", ")連接所有行條目( NA"No"除外)並存儲在新列variable_7


樣本數據

df <- data.frame(
      cbind(
      variable_1 = c('Var1', NA, NA,'Var1'),
      variable_2 = c('Var2', 'No', NA, NA),
      variable_3 = c(NA, NA, 'Var3', NA),
      variable_4 = c(NA, 'Var4', NA, NA),
      variable_5 = c(NA, 'No', 'Var5', NA),
      variable_6 = c(NA, NA, 'Var6', NA)

    ))

我想,如果有n行,那么objective就是在每行中創建一個包含字符Var的逗號分隔字符串的n向量。 (如果您打算使用其他標准來分隔所需和不需要的值,則相應地更改grep 。)

apply(df, 1, function(x) toString(grep("Var", x, value = TRUE)))
## [1] "Var1, Var2"       "Var4"             "Var3, Var5, Var6" "Var1"         

使用data.table '重新data.table '方法而不是循環/應用

library(data.table)
setDT(df)

df[, id := .I][
    melt(df, id.vars = "id")[grepl("Var", value), .(variable_7 = paste0(value, collapse = ",")), by = .(id)]
    , on = "id"
    , nomatch = 0
    ][order(id)]


#    variable_1 variable_2 variable_3 variable_4 variable_5 variable_6 id     variable_7
# 1:       Var1       Var2         NA         NA         NA         NA  1      Var1,Var2
# 2:         NA         No         NA       Var4         No         NA  2           Var4
# 3:         NA         NA       Var3         NA       Var5       Var6  3 Var3,Var5,Var6
# 4:       Var1         NA         NA         NA         NA         NA  4           Var1

使用dplyr的解決方案。 df4是最終輸出。 請看我是如何創建數據框df cbind不是必需的,添加stringsAsFactors = FALSE以防止創建因子列會很棒。

library(dplyr)
library(tidyr)

df2 <- df %>% mutate(ID = 1:n()) 

df3 <- df2 %>%
  gather(Variable, Value, -ID, na.rm = TRUE) %>%
  filter(!Value %in% "No") %>%
  group_by(ID) %>%
  summarise(variable_7 = toString(Value))

df4 <- df2 %>% 
  left_join(df3, by = "ID") %>%
  select(-ID) 

df4
#   variable_1 variable_2 variable_3 variable_4 variable_5 variable_6       variable_7
# 1       Var1       Var2       <NA>       <NA>       <NA>       <NA>       Var1, Var2
# 2       <NA>         No       <NA>       Var4         No       <NA>             Var4
# 3       <NA>       <NA>       Var3       <NA>       Var5       Var6 Var3, Var5, Var6
# 4       Var1       <NA>       <NA>       <NA>       <NA>       <NA>             Var1

數據

df <- data.frame(
    variable_1 = c('Var1', NA, NA,'Var1'),
    variable_2 = c('Var2', 'No', NA, NA),
    variable_3 = c(NA, NA, 'Var3', NA),
    variable_4 = c(NA, 'Var4', NA, NA),
    variable_5 = c(NA, 'No', 'Var5', NA),
    variable_6 = c(NA, NA, 'Var6', NA),
    stringsAsFactors = FALSE
  )

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM