简体   繁体   中英

calculate lag percentage difference in data.table in R

I am trying to calculate the lag difference between count variables. However, my data has a group variable. I want the lag to be calculated for each group separately.

until now i have the following:

dput(head(mydata,20))
structure(list(startYear = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 
2L, 3L, 4L, 5L, 1L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 3L), .Label = c("2014", 
"2015", "2016", "2017", "2018"), class = "factor"), groupID = c("AISAC-0000", 
"AISAC-0000", "AISAC-0000", "AISAC-0000", "AISAC-0000", "ASSAT-0000", 
"ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "BAYER-0001", 
"BAYSC-0002", "GECER-0002", "HANIN-0000", "HANIN-0000", "HOCED-0001", 
"HOCEN-0000", "INDAL-0000", "INDAL-0000", "INDAL-0000"), N = c(82, 
124, 60, 164, 65, 142, 183, 142, 75, 185, 145, 22, 162, 92, 4, 
166, 57, 11, 199, 137)), row.names = c(NA, -20L), class = c("data.table", 
"data.frame"))
mydata <- mydata[ ,var_calc := paste0(round((N/lag(N) - 1) * 100, digits = 3) , " %")]

the desired output is:

mydata %>%
group_by(groupID) %>%
arrange(startYear,  .by_group = TRUE) %>%
  mutate(var_calc := paste0(round((N/lag(N) - 1) * 100, digits = 3) , " %")) 

what is the .by_group = TRUE alternative in data.table ?

How can i force positive values to have an + ?

You can do the following in data.table , using by to group and shift instead of lag:

Edit: added function to get a nicer percent and plus sign output

library(data.table)
mydata <- structure(list(startYear = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 
                                       2L, 3L, 4L, 5L, 1L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 3L), .Label = c("2014", 
                                                                                                           "2015", "2016", "2017", "2018"), class = "factor"), groupID = c("AISAC-0000", 
                                                                                                                                                                           "AISAC-0000", "AISAC-0000", "AISAC-0000", "AISAC-0000", "ASSAT-0000", 
                                                                                                                                                                           "ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "BAYER-0001", 
                                                                                                                                                                           "BAYSC-0002", "GECER-0002", "HANIN-0000", "HANIN-0000", "HOCED-0001", 
                                                                                                                                                                           "HOCEN-0000", "INDAL-0000", "INDAL-0000", "INDAL-0000"), N = c(82, 
                                                                                                                                                                                                                                          124, 60, 164, 65, 142, 183, 142, 75, 185, 145, 22, 162, 92, 4, 
                                                                                                                                                                                                                                          166, 57, 11, 199, 137)), row.names = c(NA, -20L), class = c("data.table", 
                                                                                                                                                                                                                                                                                                      "data.frame"))

setDT(mydata)

addplus <- function(x, digits=3){
  x <- setNames(x, round(100*x, digits = digits))
  ifelse(is.na(x), x, 
         ifelse(sign(x) == 1, paste0("+", names(x), "%"), paste0(names(x), "%"))
  )
}

mydata[ , var_calc := addplus(N/shift(N) - 1), by="groupID"][]
#>     startYear    groupID   N   var_calc
#>  1:      2014 AISAC-0000  82       <NA>
#>  2:      2015 AISAC-0000 124    +51.22%
#>  3:      2016 AISAC-0000  60   -51.613%
#>  4:      2017 AISAC-0000 164  +173.333%
#>  5:      2018 AISAC-0000  65   -60.366%
#>  6:      2014 ASSAT-0000 142       <NA>
#>  7:      2015 ASSAT-0000 183   +28.873%
#>  8:      2016 ASSAT-0000 142   -22.404%
#>  9:      2017 ASSAT-0000  75   -47.183%
#> 10:      2018 ASSAT-0000 185  +146.667%
#> 11:      2014 BAYER-0001 145       <NA>
#> 12:      2014 BAYSC-0002  22       <NA>
#> 13:      2016 GECER-0002 162       <NA>
#> 14:      2015 HANIN-0000  92       <NA>
#> 15:      2016 HANIN-0000   4   -95.652%
#> 16:      2014 HOCED-0001 166       <NA>
#> 17:      2014 HOCEN-0000  57       <NA>
#> 18:      2014 INDAL-0000  11       <NA>
#> 19:      2015 INDAL-0000 199 +1709.091%
#> 20:      2016 INDAL-0000 137   -31.156%

Created on 2020-05-12 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM