简体   繁体   中英

Rolling Sum Dplyr

If I have a dataframe and I want to use rolling sum to sum the previous n rows and the next n rows, what is the best way to do this? I'm aware of the roll_sum, but I can't find a way to use it to fit my use case. For example let's say I have the a vector n. And I specify that I want my window to be 1, that means for each record I want to sum it and the two adjacent records.

n   window1
1   NA
3   8
4   12
5   15
6   18
7   22
9   17
1   15
5   6

If I specified 2 as my window size then this would be the result:

n   window1 window2
1   NA  NA
3   8   NA
4   12  19
5   15  25
6   18  31
7   22  28
9   17  28
1   15  22
5   6   15

Is there an easy way to do this?

There are likely dedicated functions, but this seems to work. It gives you some control on how you want it to behave. For example, the default = 0 in the lead function allows it to go to the last record, even though there are no leading values. My bet is that this is relatively slow and inefficient.

library(dplyr)
library(purrr)

rolling_sum <- function(v, window = 1) {

  k <- 1:window

  vLag <- k %>%
    map_dfc(~lag(v, .))

  vLead <- k %>%
    map_dfc(~lead(v, ., default = 0))

  rowSums(bind_cols(vLag, V = v, vLead))

}

df <- data.frame(n = c(1,3,4,5,6,7,9,1,5))

df %>%
  mutate(window1 = rolling_sum(n, 1),
         window2 = rolling_sum(n, 2))

I think rollapplyr from the zoo package is your friend. With align = 'center' you can sum previous n rows and next n rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM