简体   繁体   中英

flagging data by date per group in R

In every ID group, I only want to flag those years, that have n years of experience (past data) AND also have one future year. So for example 2020, would always get 0 , because there is no `2021 in the data.

ID <- c(rep("A5", 15), rep("B2", 15))
product <- rep(rep(c("prod1","prod2","prod3", "prod55", "prod4", "prod9", "prod83"),3),2)
# start <- c(rep("01.01.2016", 3), rep("01.01.2015", 3), rep("01.01.2014",3),
#            rep("01.01.2013",3), rep("01.01.2012",3))
start <- rep(c(rep(2016, 3), rep(2017, 3), rep(2018 ,3),
           rep(2019,3), rep(2020,3)),2)
prodID <- rep(c(3,1,2,3,1,2,3,1,2,3,2,1,3,1,2),2)
mydata <- cbind(ID, product[1:15], start, prodID)
mydata <- as.data.table(mydata)

so the result would be something like for n=3 :

    ID     V2 start result
 1: A5  prod1  2016      0
 2: A5  prod2  2016      0
 3: A5  prod3  2016      0
 4: A5 prod55  2017      0
 5: A5  prod4  2017      0
 6: A5  prod9  2017      0
 7: A5 prod83  2018      1
 8: A5  prod1  2018      1
 9: A5  prod2  2018      1
10: A5  prod3  2019      1
11: A5 prod55  2019      1
12: A5  prod4  2019      1
13: A5  prod9  2020      0
14: A5 prod83  2020      0
15: A5  prod1  2020      0
16: B2  prod1  2016      0
17: B2  prod2  2016      0
18: B2  prod3  2016      0
19: B2 prod55  2017      0
20: B2  prod4  2017      0
21: B2  prod9  2017      0
22: B2 prod83  2018      1
23: B2  prod1  2018      1
24: B2  prod2  2018      1
25: B2  prod3  2019      1
26: B2 prod55  2019      1
27: B2  prod4  2019      1
28: B2  prod9  2020      0
29: B2 prod83  2020      0
30: B2  prod1  2020      0

We can use between :

library(data.table)
n = 3

mydata[, result := +(between(start, min(start) + n - 1, max(start) - 1)), ID]

which returns

mydata
#    ID     V2 start result
# 1: A5  prod1  2016      0
# 2: A5  prod2  2016      0
# 3: A5  prod3  2016      0
# 4: A5 prod55  2017      0
# 5: A5  prod4  2017      0
# 6: A5  prod9  2017      0
# 7: A5 prod83  2018      1
# 8: A5  prod1  2018      1
# 9: A5  prod2  2018      1
#10: A5  prod3  2019      1
#11: A5 prod55  2019      1
#12: A5  prod4  2019      1
#13: A5  prod9  2020      0
#14: A5 prod83  2020      0
#15: A5  prod1  2020      0
#16: B2  prod1  2016      0
#17: B2  prod2  2016      0
#18: B2  prod3  2016      0
#19: B2 prod55  2017      0
#20: B2  prod4  2017      0
#21: B2  prod9  2017      0
#22: B2 prod83  2018      1
#23: B2  prod1  2018      1
#24: B2  prod2  2018      1
#25: B2  prod3  2019      1
#26: B2 prod55  2019      1
#27: B2  prod4  2019      1
#28: B2  prod9  2020      0
#29: B2 prod83  2020      0
#30: B2  prod1  2020      0
#    ID     V2 start result

between returns a boolean TRUE / FALSE value indicating if value is in the range between two values. Equivalent way would be:

mydata[, result := +(start >= min(start) + n - 1 & start <= max(start) - 1), ID]

+ converts the boolean values (TRUE/FALSE) to integer values (1/0).

data

Don't use cbind while creating data, use data.frame or data.table directly.

mydata <- data.table(ID, product[1:15], start)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM