简体   繁体   中英

R: Create a variable based on another variable which counts the rows since first appearance

I need to create a variable ( V3 ) based on two pre-existing variables ( V1 and V2 ). V1 is the year and V2 is a dummy variable. I want to create V3 which counts the number of years since the dummy variable ( V2 ) is 1 for the first time in the dataset. See the required output of V3 below. Notice that when V1 skips a year from 2005 to 2007, the increment in V3 recognises that.

V1 V2 V3
2001 0 0
2002 0 0
2003 1 1
2004 1 2
2005 1 3
2007 1 5

Here's the data:

df<-data.frame(V1=c(2001, 2002, 2003, 2004, 2005, 2007), 
               V2=c(0, 0, 1, 1, 1, 1))

My failed attempt using dplyr:

df2 <- df %>%
mutate(V3 = case_when(V2 == 1 ~ V1 - min(V1)))

My attempt uses min(V1) to capture 2001 instead of 2003.

Thanks for your help.

Using match and pmax -

library(dplyr)

df %>% mutate(V3 = pmax(V1 - V1[match(1, V2)] + 1, 0))

#    V1 V2 V3
#1 2001  0  0
#2 2002  0  0
#3 2003  1  1
#4 2004  1  2
#5 2005  1  3
#6 2007  1  5

V1[match(1, V2)] returns the V1 value where V2 was 1 for the first time. We subtract that value from each V1 . pmax is used to change the negative values to 0.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM