简体   繁体   中英

Delete duplicated rows in R with conditions in other columns

This is a little subset of the data:

I have:

Id var1 var2
1   POS NA
1   NA  NEG
2   NEG NA
2   NA  NEG
3   POS NA
3   NA  NEG
4   POS POS
5   POS NA

My ideal output

Id var1 var2
1   POS  NEG
2   NEG  NEG
3   POS  NEG
4   POS  POS
5   POS  NA

I would simply like to delete duplicated Id and have one row per unique id with the good result in var1 and var2. Anyone see the issue? Help would be greatly appreciated. Thank you !

you can use dplyr :

library(dplyr)
mydata %>%
  group_by(ID) %>%
  summarise(
     var1 = var1[!is.na(var1)][1],
     var2 = var2[!is.na(var2)][1]
   )

The already listed solution is way more compact, but i was working on this and therefore posting it for additional info. For loop solution.

library(data.table)

#convert dt to a data table
setDT(dt)

#create list to bind results to of the for loop
result <- list()

#create for loop
for(i in unique(dt$Id)){
  #subset a unique ID and store it in dt1
  dt1 <- dt[var3 == i]

  #create a data table to add results too
  dt1.dt <- data.table()  

  #add the ID to the data table
  dt1.dt[, ID := i]

  #add var1 to the data.table (value which is not NA)
  dt1.dt[, var1 := dt1[!is.na(var1)]$var1]

  #do the same for var2
  dt1.dt[, var2 := dt1[!is.na(var2)]$var1]

  #add the results to the list created before the for loop.
  result[[i]] <- dt1.dt
}

#rbind the list
result <- do.call(rbind, result)

You could try a solution with na.omit . This function will remove NA within each group. Assuming your data frame is df ...

In base R:

aggregate(. ~ Id,
          data = df, 
          FUN = function(x) { 
            y = na.omit(x) 
            y[length(y) == 0] <- NA 
            y 
          },
          na.action = "na.pass")

Note that y[length(y) == 0] is included to ensure cases like Id 5 and var2 are NA and not character(0) .


With dplyr :

library(dplyr)

df %>% 
  group_by(Id) %>%
  summarise(across(everything(), ~ first(na.omit(.))))

Using first will include the first value within the group after NA removed. across(everything()) will apply this method to all columns.


With data.table :

library(data.table)

setDT(df)[, lapply(.SD, na.omit), by = Id]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM