简体   繁体   中英

R: Efficient Rolling Calculations by Group

I have some asset data in the middle of a dplyr pipeline similar to this:

fcast <- data.frame(group = rep(c('a','b'),each=12),
                yr = rep(2018:2019,each=6,times=2),
                mo = rep(c(7:12,1:6),times=2),
                book_value = c(10000,rep(0,times=11),15000,rep(0,times=11)),
                accum_depr = c(200,rep(0,times=11),700,rep(0,times=11)),
                depr_rate = .02,
                depr_expense = c(10,rep(0,times=11),15,rep(0,times=11)),
                book_addn = c(0,0,0,0,80,0,0,40,0,0,0,0,0,0,100,70,0,0,0,0,0,0,0,0),
                book_growth = 1.01
                )

I need to apply some (ideally, tidy ) rolling function to each group like the one below, which does not work at the moment.

roll_depr <- function(.data) {
   r_d <- .data$depr_rate[1]
   r_g <- .data$book_growth[1]

   for(i in 2:length(.data$depreciation_rate)) {
      .data$book_value[i] <- .data$book_value[i-1]*r_g + .data$book_addn[i]
      .data$depr_expense[i] <- (.data$book_value[i] - .data$accum_depr[i-1])*r_d
      .data$accum_depr[i] <- .data$accum_depr[i-1]+.data$depr_expense[i]
   }
   return(.data)
}

To further complicate things, this calculation will be performed in a shiny dashboard repeatedly as users input new values for book_addn . The actual dataset is much larger, and for loops don't cut it.

I know a better solution must exist with data.table or apply , but I haven't been able to figure it out. Bonus points if this can be accomplished from within the pipeline!

EDIT: I'm expecting the code to output the following table. Basically, the book_value grows at 1% of the previous value, plus any additions in the period. The depr_expense takes the book_value net of the previous accum_depr , and multiplies by the depr_rate . Finally, accum_depr updates to account for the newly-calculated depr_expense .

group   yr  mo  book_value  accum_depr  depr_rate   depr_expense    book_addn   book_growth
a     2018  7   10000.00    200.00      0.02        10.00           0           1.01
a     2018  8   10100.00    398.00      0.02        198.00          0           1.01
a     2018  9   10201.00    594.06      0.02        196.06          0           1.01
a     2018  10  10303.01    788.24      0.02        194.18          0           1.01
a     2018  11  10486.04    982.20      0.02        193.96          80          1.01
a     2018  12  10590.90    1174.37     0.02        192.17          0           1.01
a     2019  1   10696.81    1364.82     0.02        190.45          0           1.01
a     2019  2   10843.78    1554.40     0.02        189.58          40          1.01
a     2019  3   10952.22    1742.35     0.02        187.96          0           1.01
a     2019  4   11061.74    1928.74     0.02        186.39          0           1.01
a     2019  5   11172.35    2113.61     0.02        184.87          0           1.01
a     2019  6   11284.08    2297.02     0.02        183.41          0           1.01
b     2018  7   15000.00    700.00      0.02        15.00           0           1.01
b     2018  8   15150.00    989.00      0.02        289.00          0           1.01
b     2018  9   15401.50    1277.25     0.02        288.25          100         1.01
b     2018  10  15625.52    1564.22     0.02        286.97          70          1.01
b     2018  11  15781.77    1848.57     0.02        284.35          0           1.01
b     2018  12  15939.59    2130.39     0.02        281.82          0           1.01
b     2019  1   16098.98    2409.76     0.02        279.37          0           1.01
b     2019  2   16259.97    2686.76     0.02        277.00          0           1.01
b     2019  3   16422.57    2961.48     0.02        274.72          0           1.01
b     2019  4   16586.80    3233.99     0.02        272.51          0           1.01
b     2019  5   16752.67    3504.36     0.02        270.37          0           1.01
b     2019  6   16920.19    3772.68     0.02        268.32          0           1.01

This can actually be done at decent speed with two simple functions that implement for loops, and using them within mutate .

The key is to recognize that book_value can be calculated independently in its own loop. Once that has been done, accum_depr[i] is only a function of accum_depr[i-1] and book_value[i] . The depr_expense can be extracted as the difference between accum_depr and its lag, but I don't need it for my purposes.

expn[i] = (book[i] - accum_depr[i-1])*depr_rate
accum_depr[i] = accum_depr[i-1] + expn[i]

Which implies

accum_depr[i] = accum_depr[i-1]*(1-depr_rate) + book_value[i]*depr_rate

The code:

roll_book <- function(book_val,addn,g_rate) {
  z <- rep(0,length(book_val))
  z[1] <- book_val[1]
  for(i in 2:length(book_val)) {
    z[i] <- z[i-1]*g_rate[1] + addn[i]
  }
  return(z)
}

roll_depr <- function(accum_depr,book_val,depr_rate) {
  r_d <- depr_rate[1]
  z <- rep(0, length(accum_depr))
  z[1] <- accum_depr[1]
  for(i in 2:length(accum_depr)) {
    z[i] <- book_val[i]*r_d + z[i-1]*(1-r_d)
  }
  return(z)
}

fcast <- fcast %>% 
  group_by(group) %>% 
  mutate(book_value = roll_book(book_value,book_addn,book_growth),
         accum_depr = roll_depr(accum_depr,book_value,depr_rate))

On my dataset with ~110,000 rows and ~450 groups:

Unit: milliseconds
      min       lq     mean   median       uq      max neval
 65.01492 67.14825 70.80178 69.85741 72.53611 98.75224   100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM