R: Uniques (or dplyr distinct) + most recent date

Question

I have a dataframe consisting of rows of information that include repeats based on Name from different dates. I'd like to filter this df into one that includes only unique Names, but also to choose the most recent occurrence if given the chance. I am a big fan of dplyr and have used combinations of distinct and select before, but the documentation makes it seem that this cannot be done with it alone:

"Variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved."

This seems like a problem that would occur commonly, so I was wondering if anyone had any advice. An example df is below, which reflects that my real data has Names as a character class and the Date as POSIXct that I generated using the lubridate package.

structure(list(Name = c("John", "John", "Mary", "John", "Mary", 
"Chad"), Date = structure(c(1430438400, 1433116800, 1335830400, 
1422748800, 1435708800, 1427846400), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), .Names = c("Name", "Date"), row.names = c(NA, -6L
), class = "data.frame")

The desired result is:

structure(list(Name = c("John", "Mary", "Chad"), Date = structure(c(1433116800, 
1435708800, 1427846400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("Name", 
"Date"), row.names = c(2L, 5L, 6L), class = "data.frame")

Thank you for your help.

Answer 1

The simplest way would be

DF %>% arrange(desc(Date)) %>% distinct(Name)

If you really want the names to be kept in the same order, these also work (thanks to @akrun):

DF %>% group_by(Name) %>% slice(which.max(Date))  # @akrun's better idea
DF %>% group_by(Name) %>% filter(Date==max(Date)) # my idea

R: Uniques (or dplyr distinct) + most recent date

Question

1 answers

solution1
7 ACCPTED 2015-07-21 21:38:07

R: Uniques (or dplyr distinct) + most recent date

Question

1 answers

solution1 7 ACCPTED 2015-07-21 21:38:07

solution1
7 ACCPTED 2015-07-21 21:38:07