简体   繁体   中英

R: Combine rows based on equal values in several columns

I have a dataframe that takes similar form as the toy dataframe below. I would like to merge the rows if var1 , var2 , and var3 are all equal values, creating a combination of data in the merged rows. For rows 4 - 6, where there are different values in the rows, I was wondering if there is a way to put them in the same column with a seperator in between.

df <- data.frame(var1 = c("1635", "1635", "1729", "1847", "1847", "1847"),
                 var2 = c("Aa", "Aa", "Bb", "Cc", "Cc", "Cc"), 
                 var3 = c("28", "28", "85", "27", "27", "27"), 
                 var4 = c("apple", NA, "orange", "pear", NA, NA),
                 var5 = c(NA, "tree", NA, NA, "ground", "desk")
                 )

So the output would look something like this:

在此处输入图像描述

in base R you would do:

aggregate(.~var1+var2+var3, df, \(x)toString(unique(na.omit(x))), na.action = identity)

  var1 var2 var3   var4         var5
1 1847   Cc   27   pear ground, desk
2 1635   Aa   28  apple         tree
3 1729   Bb   85 orange 

in tidyverse:

library(tidyverse)
df %>%
  group_by(var1,var2,var3) %>%
  summarize(across(var4:var5, ~toString(unique(na.omit(.x)))),.groups = 'drop')

# Groups:   var1, var2 [3]
  var1  var2  var3  var4   var5          
  <chr> <chr> <chr> <chr>  <chr>         
1 1635  Aa    28    apple  "tree"        
2 1729  Bb    85    orange ""            
3 1847  Cc    27    pear   "ground, desk"

With dplyr , you can group_by the three columns, then use summarize to concatenate the strings if they are not NA .

library(dplyr)

df %>% 
  group_by(var1, var2, var3) %>% 
  summarize(across(var4:var5, ~ifelse(all(is.na(.x)), NA, paste0(na.omit(.x), collapse = ","))), .groups = "drop")

# A tibble: 3 × 5
  var1  var2  var3  var4   var5       
  <chr> <chr> <chr> <chr>  <chr>      
1 1635  Aa    28    apple  tree       
2 1729  Bb    85    orange NA         
3 1847  Cc    27    pear   ground,desk

With data.table :

setDT(df)
df[, 
  lapply(.SD, \(x) if (all(is.na(x))) NA_character_ else paste(na.omit(x), collapse = "; ")), 
  by = var1:var3]
setDF(df)
#      var1   var2   var3   var4         var5
#    <char> <char> <char> <char>       <char>
# 1:   1635     Aa     28  apple         tree
# 2:   1729     Bb     85 orange         <NA>
# 3:   1847     Cc     27   pear ground; desk

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM