简体   繁体   中英

R: Uniques (or dplyr distinct) + most recent date

I have a dataframe consisting of rows of information that include repeats based on Name from different dates. I'd like to filter this df into one that includes only unique Names, but also to choose the most recent occurrence if given the chance. I am a big fan of dplyr and have used combinations of distinct and select before, but the documentation makes it seem that this cannot be done with it alone:

"Variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved."

This seems like a problem that would occur commonly, so I was wondering if anyone had any advice. An example df is below, which reflects that my real data has Names as a character class and the Date as POSIXct that I generated using the lubridate package.

structure(list(Name = c("John", "John", "Mary", "John", "Mary", 
"Chad"), Date = structure(c(1430438400, 1433116800, 1335830400, 
1422748800, 1435708800, 1427846400), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), .Names = c("Name", "Date"), row.names = c(NA, -6L
), class = "data.frame")

The desired result is:

structure(list(Name = c("John", "Mary", "Chad"), Date = structure(c(1433116800, 
1435708800, 1427846400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("Name", 
"Date"), row.names = c(2L, 5L, 6L), class = "data.frame")

Thank you for your help.

The simplest way would be

DF %>% arrange(desc(Date)) %>% distinct(Name)

If you really want the names to be kept in the same order, these also work (thanks to @akrun):

DF %>% group_by(Name) %>% slice(which.max(Date))  # @akrun's better idea
DF %>% group_by(Name) %>% filter(Date==max(Date)) # my idea

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM