简体   繁体   中英

Sum over previous periods for each period for each subject - R

A MWE is as follows:

library(dplyr)

Period <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) 

Subject <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)

set.seed(1)
Values <- round(rnorm(n=9,mean=5,sd=1), digits = 2)

df <- bind_cols(Period = Period, Subject = Subject, Values = Values)
df
# A tibble: 9 x 3
  Period Subject Values
   <dbl>   <dbl>  <dbl>
1      1       1   4.37
2      1       2   5.18
3      1       3   4.16
4      2       1   6.6 
5      2       2   5.33
6      2       3   4.18
7      3       1   5.49
8      3       2   5.74
9      3       3   5.58

I would like to have a variable sum_values that sums up all the previous values for each subject in each period. For example, for subject 1 in period 2, sum_values = 4.37 , and it will be 4.37+6.6 = 10.97 in period 3.

I can do it with a for loop which would be very tedious that involves many if-else statements, but I believe that there's an easy way out in R. How can I achieve my purpose easier in R?

You may also use data.table to get this done.

library(data.table)

Period <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) 
Subject <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
set.seed(1)
Values <- round(rnorm(n=9,mean=5,sd=1), digits = 2)

# create data.table 
dt <- data.table(Period = Period, Subject = Subject, Values = Values)
# create cumsum by Subject
dt[,sum_values := shift(cumsum(Values),n = 1, fill = 0), by = Subject]

dt
#>    Period Subject Values sum_values
#> 1:      1       1   4.37       0.00
#> 2:      1       2   5.18       0.00
#> 3:      1       3   4.16       0.00
#> 4:      2       1   6.60       4.37
#> 5:      2       2   5.33       5.18
#> 6:      2       3   4.18       4.16
#> 7:      3       1   5.49      10.97
#> 8:      3       2   5.74      10.51
#> 9:      3       3   5.58       8.34

Arrange the data by Period and use cumsum to get cumulative sum of Values . Since you want to sum all the previous Values use lag .

library(dplyr)

df %>%
  arrange(Subject, Period) %>%
  group_by(Subject) %>%
  mutate(Values = lag(cumsum(Values), default = 0)) %>%
  ungroup


#  Period Subject Values
#1      1       1   0.00
#2      2       1   4.37
#3      3       1  10.97
#4      1       2   0.00
#5      2       2   5.18
#6      3       2  10.51
#7      1       3   0.00
#8      2       3   4.16
#9      3       3   8.34

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM