简体   繁体   中英

How to select 'x' most recent values in each group in R?

I'm trying to select/filter the most recent values in each group in a data frame in R. For example, I would like to select the 3 most recent (ie the dates closest to today) from each Name group in the following data frame:

Player  Date    Result
 Sam    03/15/2015  1
 Sam    03/22/2015  0
 Sam    04/04/2015  2
 Sam    04/12/2015  1
 Sam    04/18/2015  1
 Sam    04/26/2015  0
 Sam    08/08/2015  3
Steve   02/17/2015  0
Steve   02/21/2015  0
Steve   03/04/2015  4
Steve   03/11/2015  2
Steve   03/15/2015  1
Steve   03/22/2015  0
Steve   04/12/2015  0
Steve   04/18/2015  2
Steve   04/26/2015  1
Steve   04/29/2015  2
Steve   08/16/2015  4
Jasper  03/15/2015  3
Jasper  03/22/2015  3.5
Jasper  04/04/2015  4
Jasper  04/12/2015  4
Jasper  04/18/2015  5
Jasper  04/26/2015  0

I have written as.date() code so R now understands the date format but what code can I now use to only select the 3 (say) most recent values from each group?

We can use dplyr . We convert the 'Date' to Date class by using as.Date . After grouping by 'Player', we arrange the 'Date' column descendingly and use slice to get the most recent 3 values. If we don't want to change the 'Date' class, we can remove the mutate step and do the conversion within the arrange ie arrange(desc(as.Date(Date, '%m/%d/%Y')))

library(dplyr)
df1 %>%
    mutate(Date=as.Date(Date, '%m/%d/%Y')) %>% 
    group_by(Player) %>% 
    arrange(desc(Date)) %>% 
    slice(1:3)
#    Player       Date Result
#1 Jasper 2015-04-26      0
#2 Jasper 2015-04-18      5
#3 Jasper 2015-04-12      4
#4    Sam 2015-08-08      3
#5    Sam 2015-04-26      0
#6    Sam 2015-04-18      1
#7  Steve 2015-08-16      4
#8  Steve 2015-04-29      2
#9  Steve 2015-04-26      1

Or after we group by the 'Player', we can use top_n by specifying the 'n' and the 'wt' variable for ordering.

 df1 %>% 
   mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
   group_by(Player)  %>%
   top_n(n = 3, Date)
#  Player       Date Result
#1    Sam 2015-04-18      1
#2    Sam 2015-04-26      0
#3    Sam 2015-08-08      3
#4  Steve 2015-04-26      1
#5  Steve 2015-04-29      2
#6  Steve 2015-08-16      4
#7 Jasper 2015-04-12      4
#8 Jasper 2015-04-18      5
#9 Jasper 2015-04-26      0

Using data.table , we convert the 'data.frame' to 'data.table' ( setDT(df1) ). Grouped by the 'Player, we order the 'Date' after converting to Date class, and using the head we can get the first 3 rows of each group.

library(data.table)
setDT(df1)[order(-as.IDate(Date, '%m/%d/%Y')),head(.SD, 3) , by = Player]
#   Player       Date Result
#1:  Steve 08/16/2015      4
#2:  Steve 04/29/2015      2
#3:  Steve 04/26/2015      1
#4:    Sam 08/08/2015      3
#5:    Sam 04/26/2015      0
#6:    Sam 04/18/2015      1
#7: Jasper 04/26/2015      0
#8: Jasper 04/18/2015      5
#9: Jasper 04/12/2015      4

data

df1 <- structure(list(Player = c("Sam", "Sam", "Sam", "Sam", "Sam", 
"Sam", "Sam", "Steve", "Steve", "Steve", "Steve", "Steve", "Steve", 
"Steve", "Steve", "Steve", "Steve", "Steve", "Jasper", "Jasper", 
"Jasper", "Jasper", "Jasper", "Jasper"), Date = c("03/15/2015", 
"03/22/2015", "04/04/2015", "04/12/2015", "04/18/2015", "04/26/2015", 
"08/08/2015", "02/17/2015", "02/21/2015", "03/04/2015", "03/11/2015", 
"03/15/2015", "03/22/2015", "04/12/2015", "04/18/2015", "04/26/2015", 
"04/29/2015", "08/16/2015", "03/15/2015", "03/22/2015", "04/04/2015", 
"04/12/2015", "04/18/2015", "04/26/2015"), Result = c(1, 0, 2, 
1, 1, 0, 3, 0, 0, 4, 2, 1, 0, 0, 2, 1, 2, 4, 3, 3.5, 4, 4, 5, 
0)), .Names = c("Player", "Date", "Result"),
class = "data.frame", row.names = c(NA,  -24L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM