简体   繁体   中英

Rolling cumsum in data.table

Trying to get (reverse) cumulative sums in a moving window by group in data.table. For example, from the following data I'd like to get those values in the "roll_cumsum" column:

dt = data.table()
dt[, a := seq(1, 10, 1)]
dt[, group := rep(1:2, each = 5)]
dt[, roll_cumsum := c(15, 14, 12, 9, 5, 40, 34, 27, 19, 10)]

I got the results I wanted with the code below but it's quite slow for a large dataset:

partial_sum = function(x) { n <- seq_along(x); cumsum(x)[length(x)] - cumsum(x)[n] + x[n] }
dt[, partial_sum(a), by = group]

Any suggestions to make the calculation faster? Thank you so much!

There is a revcumsum function

library(spatstat.utils)
dt[, roll_cumsum2 := revcumsum(a), group]

-output

dt
#     a group roll_cumsum roll_cumsum2
# 1:  1     1          15           15
# 2:  2     1          14           14
# 3:  3     1          12           12
# 4:  4     1           9            9
# 5:  5     1           5            5
# 6:  6     2          40           40
# 7:  7     2          34           34
# 8:  8     2          27           27
# 9:  9     2          19           19
#10: 10     2          10           10

Or just do the rev erse

dt[, roll_cumsum2 := rev(cumsum(rev(a))), group]

-output

dt
#     a group roll_cumsum roll_cumsum2
# 1:  1     1          15           15
# 2:  2     1          14           14
# 3:  3     1          12           12
# 4:  4     1           9            9
# 5:  5     1           5            5
# 6:  6     2          40           40
# 7:  7     2          34           34
# 8:  8     2          27           27
# 9:  9     2          19           19
#10: 10     2          10           10

Or another way is

dt[, roll_cumsum2 := cumsum(a[.N:1])[.N:1], group]

NOTE: Both are compact versions

Benchmarks

dt1 <- data.table(a = 1:1e7, group = rep(1:1e6, length.out = 1e7,  10))
system.time(dt1[, roll_cumsum := partial_sum(a), by = group])
#user  system elapsed 
# 2.073   0.037   2.094 
system.time(dt1[, roll_cumsum2 := revcumsum(a), group])
#user  system elapsed 
#  2.623   0.029   2.637 

system.time(dt1[, roll_cumsum3 := rev(cumsum(rev(a))), group])
#user  system elapsed 
#  4.275   0.051   4.276 

system.time(dt1[, roll_cumsum4 := cumsum(a[.N:1])[.N:1], group])
#user  system elapsed 
# 1.703   0.028   1.722 

system.time(dt1[, roll_cumsum5 := sum(a) - cumsum(shift(a, fill = 0)), group])
# user  system elapsed 
# 10.095   0.041  10.129 

You can subtract cumulative sum of a from sum(a) in each group.

library(data.table)
dt[, roll_cumsum1 :=  sum(a) - cumsum(shift(a, fill = 0)), group]
dt

#     a group roll_cumsum roll_cumsum1
# 1:  1     1          15           15
# 2:  2     1          14           14
# 3:  3     1          12           12
# 4:  4     1           9            9
# 5:  5     1           5            5
# 6:  6     2          40           40
# 7:  7     2          34           34
# 8:  8     2          27           27
# 9:  9     2          19           19
#10: 10     2          10           10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM