繁体   English   中英

将几列的内容合并为一列

[英]Merging the content of several columns into one

我有一个包含 5 列和 6 行的数据框(实际上它们更多,只是为了简化问题):

One Two Three   Four    Five
Cat NA  NA  NA  NA
NA  Dog NA  NA  NA
NA  NA  NA  Mouse   NA
Cat NA  Rat NA  NA
Horse   NA  NA  NA  NA
NA NA NA NA NA

现在,我想将所有信息合并到一个新的单列(“摘要”)中,如下所示:

Summary
Cat
Dog
Mouse
Error
Horse
NA

请注意第四个汇总行报告的“错误”,因为在合并期间报告了两个不同的值。 我试图查看 dplyr 包中的 'coalesce' 函数,但它似乎真的没有做我需要的。 提前致谢。

编辑:我添加了第 6 行,以表明如果行中的所有“NA”,我想在“摘要”列中得到“NA”而不是“错误”。 对不起,如果这在我的第一篇文章中不清楚。

这是通过apply一个想法,

apply(df, 1, function(i){i1 <- i[!is.na(i)]; if(length(i1) > 1){'Error'}else{i1}})
#[1] "Cat"   "Dog"   "Mouse" "Error" "Horse"

我会用 apply 来解决这个问题,因为你需要处理特定的情况。 例如

df <- structure(list(One = structure(c(1L, NA, NA, 1L, 2L), .Label = c("Cat", 
"Horse", "NA"), class = "factor"), Two = structure(c(NA, 1L, 
NA, NA, NA), .Label = c("Dog", "NA"), class = "factor"), Three = structure(c(NA, 
NA, NA, 2L, NA), .Label = c("NA", "Rat"), class = "factor"), 
    Four = structure(c(NA, NA, 1L, NA, NA), .Label = c("Mouse", 
    "NA"), class = "factor"), Five = structure(c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = "NA", class = "factor")), row.names = c(NA, 
-5L), class = "data.frame")


apply(df, 1, function(row) if(sum(!is.na(row)) == 1) na.omit(row)[[1]] else "Error")
#> [1] "Cat"   "Dog"   "Mouse" "Error" "Horse"

reprex 包(v0.3.0) 于 2020 年 1 月 14 日创建

另一种方法是在dplyr使用新的pivot_函数:

df <- tribble(~One, ~Two, ~Three,   ~Four,    ~Five,
              "Cat", NA,  NA,  NA,  NA,
              NA,  "Dog", NA,  NA,  NA,
              NA,  NA,  NA,  "Mouse",   NA,
              "Cat", NA,  "Rat", NA,  NA,
              "Horse",   NA,  NA,  NA,  NA)

df %>% 
  pivot_longer(names_to = "variable", values_to = "Summary", 
               values_drop_na = TRUE, cols = One:Five) %>% 
  distinct(Summary)
# # A tibble: 5 x 1
# Summary
# <chr>  
# 1 Cat    
# 2 Dog    
# 3 Mouse  
# 4 Rat    
# 5 Horse  
  • 这是另一个基本的 R 解决方案,使用sapply() + ifelse()
r <- sapply(as.list(as.data.frame(t(df))),
            function(x) ifelse(length(levels(x))==1, na.omit(as.vector(x)),"Error"))

以至于

> r
     V1      V2      V3      V4      V5 
  "Cat"   "Dog" "Mouse" "Error" "Horse"
  • 或者你可以使用sapply() + ifelse()
r <- apply(df, 1, function(x) ifelse(length(z <- unique(na.omit(x)))==1, z,"Error"))

以至于

> r
[1] "Cat"   "Dog"   "Mouse" "Error" "Horse"

数据

df <- structure(list(One = c("Cat", NA, NA, "Cat", "Horse"), Two = c(NA, 
"Dog", NA, NA, NA), Three = c(NA, NA, NA, "Rat", NA), Four = c(NA, 
NA, "Mouse", NA, NA), Five = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))

您也可以使用合并

df %>%
  mutate_all(as.character) %>% 
  mutate(coal = coalesce(!!!syms(names(.))),
         sum_na = rowSums(!is.na(.)),
         result = if_else(sum_na == 1,coal,"Error")) %>% 
  select(result)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM