简体   繁体   中英

R How to transform values to most recent value

I have df which looks like this:

Customer| Code  |Visit|
    A   |  A    | 2018|
    A   |  B    | 2019| 
    B   |  C    | 2017|
    B   |  D    | 2018|

How do I transform Column Code to reflect the most recent Code on a customer lvl?

Expected Result:

Customer| Code  |Visit|
    A   |  B    | 2018|
    A   |  B    | 2019| 
    B   |  D    | 2017|
    B   |  D    | 2018|

I tried this but it doesn't work:

df %>% group_by(Customer)%>% mutate(Code = max(Visit, Code))

You can use which.max to get index of max value in Visit and extract the corresponding Code .

library(dplyr)

df %>% group_by(Customer) %>% mutate(Code = Code[which.max(Visit)])

# Customer Code  Visit
#  <chr>    <chr> <int>
#1 A        B      2018
#2 A        B      2019
#3 B        D      2017
#4 B        D      2018

More efficient solution because of pre-sorting the data.frame (so we subsetting always the last observation):

library(data.table)

result <- setDT(df)[order(Customer, Visit)][,.SD[.N],by=Customer]

It might be important for bigger datasets.

An option with base R

df$Code <-  with(df,  Code[rep(which(ave(Visit, Customer, FUN = max)
             == Visit), table(Customer))])
df$Code
#[1] "B" "B" "D" "D"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM