简体   繁体   中英

Mutate min date per group using filter conditions in dplyr

I wish to create a new column which contains the minimum date per group where some conditions are met.

My data looks as follows:

mbr <- c('A','A','A','A','B','B','B')
drg_typ <- c('TGT','TGT','TGT','Other','Other','TGT','TGT')
dt <- as.Date(c('2018-01-01','2019-06-30','2019-03-18','2017-01-01','2018-01-01','2016-01-01','2019-05-01'))

df <- data.frame(mbr,drg_typ,dt)

mbr drg_typ dt
A   TGT     2018-01-01
A   TGT     2019-06-30
A   TGT     2019-03-18
A   Other   2017-01-01
B   Other   2018-01-01
B   TGT     2016-01-01
B   TGT     2019-05-01

I wish to mutate a new column called min_dt which is performed at the group level of mbr using the following logic:

For each mbr where the drg_typ = 'TGT' and the dt is between 2019-01-01 and 2019-12-31 fill a new column called min_dt with the minimum value of dt which falls between the date values above.

I have tried:

df <- df %>% 
  group_by(mbr) %>%
  mutate(min_dt = if_else(drg_typ == 'TGT' & dt >= '2019-01-01' & dt <= '2019-12-31', min(dt),0))

but I receive the following error:

Error in as.Date.numeric(value) : 'origin' must be supplied

I have checked the structure of my data frame and dt is a date

> str(df)
'data.frame':   7 obs. of  3 variables:
 $ mbr    : Factor w/ 2 levels "A","B": 1 1 1 1 2 2 2
 $ drg_typ: Factor w/ 2 levels "Other","TGT": 2 2 2 1 1 2 2
 $ dt     : Date, format: "2018-01-01" "2019-06-30" "2019-03-18" "2017-01-01" ...

My final output should look as follows:

 mbr drg_typ    dt         min_dt
    A   TGT     2018-01-01 2019-03-18
    A   TGT     2019-06-30 2019-03-18
    A   TGT     2019-03-18 2019-03-18
    A   Other   2017-01-01 2019-03-18
    B   Other   2018-01-01 2019-05-01
    B   TGT     2016-01-01 2019-05-01
    B   TGT     2019-05-01 2019-05-01

The 0 is an issue and if_else checks the type. Here the true is of type Date class while false is numeric. Instead it can be NA and convert it to Date with as.Date

library(dplyr)
df %>% 
   group_by(mbr) %>%
   mutate(min_dt = if_else(drg_typ == 'TGT' & 
       dt >= '2019-01-01' & dt <= '2019-12-31', min(dt), as.Date(NA)))

Based on the expected output, we don't need if_else here. The min of 'dt' can be based on a logical expression

df %>%
     group_by(mbr) %>%
      mutate(min_dt = min(dt[drg_typ == 'TGT' & 
         between(dt, as.Date('2019-01-01'), as.Date('2019-12-31'))]))
# A tibble: 7 x 4
# Groups:   mbr [2]
#  mbr   drg_typ dt         min_dt    
#  <fct> <fct>   <date>     <date>    
#1 A     TGT     2018-01-01 2019-03-18
#2 A     TGT     2019-06-30 2019-03-18
#3 A     TGT     2019-03-18 2019-03-18
#4 A     Other   2017-01-01 2019-03-18
#5 B     Other   2018-01-01 2019-05-01
#6 B     TGT     2016-01-01 2019-05-01
#7 B     TGT     2019-05-01 2019-05-01

Or using data.table

library(data.table)
setDT(df)[, min_dt := min(dt[drg_typ == 'TGT' & 
         between(dt, as.Date('2019-01-01'), as.Date('2019-12-31'))]), mbr]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM