I have a data frame that looks like this
idnr date
a43 2011-12-19
a4945 2012-09-11
a43 2013-10-01
a231 2012-05-09
a231 2009-09-10
a6901 2015-06-12
I want ( duplicates are defined only within idnr )
Again, duplicates are only defined within the idnr column.
idnr date newcolumn
a43 2011-12-19 2
a4945 2012-09-11 2
a43 2013-10-01 1
a231 2012-05-09 1
a231 2009-09-10 2
a6901 2015-06-12 2
Or, if you like, I want the second reported duplicate to be tagged as 1 in newcolumn and the rest to be a 2.
library(dplyr)
Let's first create your table
table <- data.frame(indr = c('a43','a4945', 'a43', 'a231', 'a231', 'a6901'),
date = c(as.Date('2011-12-19'), as.Date('2012-09-11'), as.Date('2013-10-01'),
as.Date('2012-05-09'), as.Date('2009-09-10'), as.Date('2015-06-12')))
Then we identify the duplicates
duplicates <- as.character(table$indr[duplicated(table$indr)])
Afterwards we assign a 2 to each unique value:
table$newcolumn[!table$indr %in% duplicates] <- 2
And finally you can use a for loop to assign the other values based on the max of the date
for(i in duplicates){
temp_table <- filter(table, indr == i)
high_date <- max(temp_table$date)
#now we assign the values
table$newcolumn[table$indr == i & table$date == high_date] <- 1
table$newcolumn[table$indr == i & table$date != high_date] <- 2
}
And that should do it
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.