简体   繁体   中英

Merging data in duplicated records in R

I have a data.frame as follows.

DF <- structure(list(ID = c("k1", "k1", "k2", "k2", "k3", "k3", "k3", 
"k4", "k4", "k5", "k5", "k5"), g1 = c(NA, NA, NA, NA, "robin", 
"robin", "robin", "norse", "norse", "spidey", "spidey", "spidey"
), g2 = c("olsen", "olsen", "lane", "lang", "damien", "jason", 
"dick", NA, NA, "peter", "miles", "ben"), g3 = c(NA, NA, NA, 
NA, "wayne", "todd", "grayson", "Masterson", "odinson", "616", 
"ultimate", "clone")), .Names = c("ID", "g1", "g2", "g3"), row.names = c(NA, 
12L), class = "data.frame")

DF
    ID     g1     g2        g3
 1: k1     NA  olsen        NA
 2: k1     NA  olsen        NA
 3: k2     NA   lane        NA
 4: k2     NA   lang        NA
 5: k3  robin damien     wayne
 6: k3  robin  jason      todd
 7: k3  robin   dick   grayson
 8: k4  norse     NA Masterson
 9: k4  norse     NA   odinson
10: k5 spidey  peter       616
11: k5 spidey  miles  ultimate
12: k5 spidey    ben    

How to merge the duplicated records according to the key column ID only if the data in the records differ to get the following result?

out <- structure(list(ID = c("k1", "k2", "k3", "k4", "k5"), g1 = c("NA", 
"NA", "robin", "norse", "spidey"), g2 = c("olsen", "lane:lang", 
"damien:jason:dick", "NA", "peter:miles:ben"), g3 = c("NA", "NA", 
"wayne:todd:grayson", "Masterson:odinson", "616:ultimate:clone"
)), row.names = c(NA, -5L), class = "data.frame", .Names = c("ID", 
"g1", "g2", "g3"))
out
  ID     g1                g2                 g3
1 k1     NA             olsen                 NA
2 k2     NA         lane:lang                 NA
3 k3  robin damien:jason:dick wayne:todd:grayson
4 k4  norse                NA  Masterson:odinson
5 k5 spidey   peter:miles:ben 616:ultimate:clone

Solution using data.table .

library(data.table)

Create data.table

DT <- as.data.table(DF)

Merge the duplicated records

DT[, lapply(.SD, function(x) paste(unique(x), collapse = ":")), by = ID]

Using dplyr

library(dplyr)
DF %>%
   group_by(ID) %>%
   summarise_each(funs(paste(unique(.), collapse=":")))
#  ID     g1                g2                 g3
#1 k1     NA             olsen                 NA
#2 k2     NA         lane:lang                 NA
#3 k3  robin damien:jason:dick wayne:todd:grayson
#4 k4  norse                NA  Masterson:odinson
#5 k5 spidey   peter:miles:ben 616:ultimate:clone

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM