How to find duplicates in one column and tag them in a new column depending on date order in a third column using R

Question

I have a data frame that looks like this

idnr       date
a43        2011-12-19
a4945      2012-09-11
a43        2013-10-01
a231       2012-05-09
a231       2009-09-10
a6901      2015-06-12

I want ( duplicates are defined only within idnr )

the lower date of duplicates to be marked with 2 in newcolumn
the higher date of duplicates to be marked with 1
the non-duplicates to be marked with 2:

Again, duplicates are only defined within the idnr column.

   idnr       date          newcolumn
   a43        2011-12-19    2
   a4945      2012-09-11    2
   a43        2013-10-01    1
   a231       2012-05-09    1
   a231       2009-09-10    2
   a6901      2015-06-12    2

Or, if you like, I want the second reported duplicate to be tagged as 1 in newcolumn and the rest to be a 2.

Answer 1

library(dplyr)

Let's first create your table

table <- data.frame(indr = c('a43','a4945', 'a43', 'a231', 'a231', 'a6901'),
                date = c(as.Date('2011-12-19'), as.Date('2012-09-11'),  as.Date('2013-10-01'),
                as.Date('2012-05-09'), as.Date('2009-09-10'), as.Date('2015-06-12')))

Then we identify the duplicates

duplicates <- as.character(table$indr[duplicated(table$indr)])

Afterwards we assign a 2 to each unique value:

table$newcolumn[!table$indr %in% duplicates] <- 2

And finally you can use a for loop to assign the other values based on the max of the date

for(i in duplicates){
temp_table <- filter(table, indr == i)
high_date <- max(temp_table$date)
#now we assign the values
table$newcolumn[table$indr == i & table$date == high_date] <- 1
table$newcolumn[table$indr == i & table$date != high_date] <- 2
}

And that should do it

How to find duplicates in one column and tag them in a new column depending on date order in a third column using R

Question

1 answers

solution1
1 ACCPTED 2019-06-27 20:28:39

How to find duplicates in one column and tag them in a new column depending on date order in a third column using R

Question

1 answers

solution1 1 ACCPTED 2019-06-27 20:28:39

solution1
1 ACCPTED 2019-06-27 20:28:39