简体   繁体   中英

Calculate sum of n previous rows

I have a quite big dataframe and I'm trying to add a new variable which is the sum of the three previous rows on a running basis, also it should be grouped by ID. The first three rows per ID should be 0. Here's what it should look like.

ID   Var1  VarNew
1     2      0
1     2      0
1     3      0
1     0      7
1     4      5
1     1      7

Here's an example dataframe

ID <- c(1, 1, 1, 1, 1, 1)
Var1 <- c(2, 2, 3, 0, 4, 1)
df <- data.frame(ID, Var1)

You can use any of the package that has rolling calculation function with a window size of 3 and lag the result. For example with zoo::rollsumr .

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(VarNew = lag(zoo::rollsumr(Var1, 3, fill = 0), default = 0)) %>%
  ungroup

#     ID  Var1 VarNew
#  <dbl> <dbl>  <dbl>
#1     1     2      0
#2     1     2      0
33     1     3      0
#4     1     0      7
#5     1     4      5
#6     1     1      7

You can use filter in ave .

df$VarNew <- ave(df$Var1, df$ID, FUN=function(x) c(0, 0, 0,
 filter(head(df$Var1, -1), c(1,1,1), side=1)[-1:-2]))
df
#  ID Var1 VarNew
#1  1    2      0
#2  1    2      0
#3  1    3      0
#4  1    0      7
#5  1    4      5
#6  1    1      7

or using cumsum in combination with head and tail .

df$VarNew <- ave(df$Var1, df$ID, FUN=function(x) {y <- cumsum(x)
  c(0, 0, 0, tail(y, -3) - head(y, -3))})

Library runner also helps

library(runner)
df %>% mutate(var_new = sum_run(Var1, k =3, na_pad = T, lag = 1))

  ID Var1 var_new
1  1    2      NA
2  1    2      NA
3  1    3      NA
4  1    0       7
5  1    4       5
6  1    1       7

NA s can be mutated to 0 if desired so, easily.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM