在 R 中使用 unite 函数并删除重复值

Question

I'm trying to use the unite function in R to concatenate values across columns, but also deduplicate the values.我正在尝试使用 R 中的unite函数来跨列连接值，但也对值进行重复数据删除。 How can I accomplish this?我怎样才能做到这一点？

Here is the input data:这是输入数据：

input <- tibble(
  id = c('aa', 'ss', 'dd', 'qq'),
  '2017' = c('tv', NA, NA, 'web'),
  '2018' = c('tv', 'web', NA, NA),
  '2019' = c(NA, 'web', 'book', 'tv')
)

# A tibble: 4 x 4
  id    `2017` `2018` `2019`
  <chr> <chr>  <chr>  <chr> 
1 aa    tv     tv     NA    
2 ss    NA     web    web    
3 dd    NA     NA     book  
4 qq    web    NA     tv

The desired output with the ALL column is: ALL 列所需的输出是：

> output
# A tibble: 4 x 5
  id    `2017` `2018` `2019` ALL   
  <chr> <chr>  <chr>  <chr>  <chr> 
1 aa    tv     tv     NA     tv    
2 ss    NA     web    web    web   
3 dd    NA     NA     book   book  
4 qq    web    NA     tv     web, tv

Answer 1

Similar questions exist here on SO, but since you are after a unite solution and I couldn't find any that specifically use unite , here we go: SO 上也存在类似的问题，但是由于您正在寻求unite解决方案，而我找不到任何专门使用unite解决方案，因此我们开始：

Using unite使用unite

input %>% unite(ALL, -id, sep = ", ", remove = FALSE, na.rm = TRUE)
## A tibble: 4 x 5
#  id    ALL     `2017` `2018` `2019`
#  <chr> <chr>   <chr>  <chr>  <chr>
#1 aa    tv      tv     NA     NA
#2 ss    web     NA     web    NA
#3 dd    book    NA     NA     book
#4 qq    web, tv web    NA     tv

To recover the exact column order of your expected output, you can add a %>% select(names(input), ALL) .要恢复预期输出的确切列顺序，您可以添加%>% select(names(input), ALL) 。

Alternatively, using nest或者，使用nest

input %>%
    group_by(id) %>%
    nest() %>%
    mutate(ALL = map_chr(data, ~toString(unlist(.x[!is.na(unlist(.x))])))) %>%
    unnest(data)
## A tibble: 4 x 5
## Groups:   id [4]
#  id    `2017` `2018` `2019` ALL
#  <chr> <chr>  <chr>  <chr>  <chr>
#1 aa    tv     NA     NA     tv
#2 ss    NA     web    NA     web
#3 dd    NA     NA     book   book
#4 qq    web    NA     tv     web, tv

Or the base R way (as in How to create new column with all non-NA values from multiple other columns? ):或基本 R 方式（如如何使用来自多个其他列的所有非 NA 值创建新列？）：

input$ALL <- apply(input[, -1], 1, function(x) toString(x[!is.na(x)]))
input
# A tibble: 4 x 5
#   id    `2017` `2018` `2019` ALL
#  <chr> <chr>  <chr>  <chr>  <chr>
#1 aa    tv     NA     NA     tv
#2 ss    NA     web    NA     web
#3 dd    NA     NA     book   book
#4 qq    web    NA     tv     web, tv

Answer 2

I am not sure if deduplicating is possible with unite , however you can use apply row-wise.我不确定unite是否可以进行重复数据删除，但是您可以按行apply 。

input$ALL <- apply(input[-1], 1, function(x) toString(na.omit(unique(x))))

Or a tidyverse way could be using pmap或者一种tidyverse方式可能是使用pmap

library(tidyverse)

input %>%
  mutate(ALL = pmap_chr(select(., -id), ~toString(unique(na.omit(c(...))))))

#  id    `2017` `2018` `2019` ALL    
#  <chr> <chr>  <chr>  <chr>  <chr>  
#1 aa    tv     tv     NA     tv     
#2 ss    NA     web    web    web    
#3 dd    NA     NA     book   book   
#4 qq    web    NA     tv     web, tv

Or getting the data in long format and then joining或者以长格式获取数据然后加入

input %>%
  pivot_longer(cols = -id, values_drop_na = TRUE) %>%
  group_by(id) %>%
  summarise(ALL = toString(unique(value))) %>%
  left_join(input)

在 R 中使用 unite 函数并删除重复值

问题描述

2 个解决方案

解决方案1
2 2020-02-17 04:09:41

解决方案2
1 已采纳 2020-02-17 05:35:09

在 R 中使用 unite 函数并删除重复值

问题描述

2 个解决方案

解决方案1 2 2020-02-17 04:09:41

解决方案2 1 已采纳 2020-02-17 05:35:09

解决方案1
2 2020-02-17 04:09:41

解决方案2
1 已采纳 2020-02-17 05:35:09