I am trying to calculate the lag difference between count variables. However, my data has a group variable. I want the lag to be calculated for each group separately.
until now i have the following:
dput(head(mydata,20))
structure(list(startYear = structure(c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 3L), .Label = c("2014",
"2015", "2016", "2017", "2018"), class = "factor"), groupID = c("AISAC-0000",
"AISAC-0000", "AISAC-0000", "AISAC-0000", "AISAC-0000", "ASSAT-0000",
"ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "BAYER-0001",
"BAYSC-0002", "GECER-0002", "HANIN-0000", "HANIN-0000", "HOCED-0001",
"HOCEN-0000", "INDAL-0000", "INDAL-0000", "INDAL-0000"), N = c(82,
124, 60, 164, 65, 142, 183, 142, 75, 185, 145, 22, 162, 92, 4,
166, 57, 11, 199, 137)), row.names = c(NA, -20L), class = c("data.table",
"data.frame"))
mydata <- mydata[ ,var_calc := paste0(round((N/lag(N) - 1) * 100, digits = 3) , " %")]
the desired output is:
mydata %>%
group_by(groupID) %>%
arrange(startYear, .by_group = TRUE) %>%
mutate(var_calc := paste0(round((N/lag(N) - 1) * 100, digits = 3) , " %"))
what is the .by_group = TRUE
alternative in data.table
?
How can i force positive values to have an +
?
You can do the following in data.table
, using by
to group and shift
instead of lag:
Edit: added function to get a nicer percent and plus sign output
library(data.table)
mydata <- structure(list(startYear = structure(c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 3L), .Label = c("2014",
"2015", "2016", "2017", "2018"), class = "factor"), groupID = c("AISAC-0000",
"AISAC-0000", "AISAC-0000", "AISAC-0000", "AISAC-0000", "ASSAT-0000",
"ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "BAYER-0001",
"BAYSC-0002", "GECER-0002", "HANIN-0000", "HANIN-0000", "HOCED-0001",
"HOCEN-0000", "INDAL-0000", "INDAL-0000", "INDAL-0000"), N = c(82,
124, 60, 164, 65, 142, 183, 142, 75, 185, 145, 22, 162, 92, 4,
166, 57, 11, 199, 137)), row.names = c(NA, -20L), class = c("data.table",
"data.frame"))
setDT(mydata)
addplus <- function(x, digits=3){
x <- setNames(x, round(100*x, digits = digits))
ifelse(is.na(x), x,
ifelse(sign(x) == 1, paste0("+", names(x), "%"), paste0(names(x), "%"))
)
}
mydata[ , var_calc := addplus(N/shift(N) - 1), by="groupID"][]
#> startYear groupID N var_calc
#> 1: 2014 AISAC-0000 82 <NA>
#> 2: 2015 AISAC-0000 124 +51.22%
#> 3: 2016 AISAC-0000 60 -51.613%
#> 4: 2017 AISAC-0000 164 +173.333%
#> 5: 2018 AISAC-0000 65 -60.366%
#> 6: 2014 ASSAT-0000 142 <NA>
#> 7: 2015 ASSAT-0000 183 +28.873%
#> 8: 2016 ASSAT-0000 142 -22.404%
#> 9: 2017 ASSAT-0000 75 -47.183%
#> 10: 2018 ASSAT-0000 185 +146.667%
#> 11: 2014 BAYER-0001 145 <NA>
#> 12: 2014 BAYSC-0002 22 <NA>
#> 13: 2016 GECER-0002 162 <NA>
#> 14: 2015 HANIN-0000 92 <NA>
#> 15: 2016 HANIN-0000 4 -95.652%
#> 16: 2014 HOCED-0001 166 <NA>
#> 17: 2014 HOCEN-0000 57 <NA>
#> 18: 2014 INDAL-0000 11 <NA>
#> 19: 2015 INDAL-0000 199 +1709.091%
#> 20: 2016 INDAL-0000 137 -31.156%
Created on 2020-05-12 by the reprex package (v0.3.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.