简体   繁体   中英

R - cumulative product & sum by group

I have the following dataset and would like to add a new column 'colY'. How to achieve it (the following shows how colY is calculated)?

GROUP   ID  colX   colY
1       1   0.8    =0.8*(1+0.7*(1+0.6))
1       2   0.7    =0.7*(1+0.6)
1       3   0.6    =0.6
2       1   1.3    =1.3*(1+1.2*(1+1.1*(1+1.0)))
2       2   1.2    =1.2*(1+1.1*(1+1.0))
2       3   1.1    =1.1*(1+1.0)
2       4   1.0    =1.0

Preferably in data.table syntax. Thank you!

Check this

runsum <- function(x){
  b <- as.numeric()
  len <- length(x)
  for(i in 1:len){
    b[i] <- sum(cumprod(x[i:len]))
  }
  return(b)
}
dt[, colY := runsum(colX),by=GROUP]

Result:

   GROUP ID colX  colY
1:     1  1  0.8 1.696
2:     1  2  0.7 1.120
3:     1  3  0.6 0.600
4:     2  1  1.3 6.292
5:     2  2  1.2 3.840
6:     2  3  1.1 2.200
7:     2  4  1.0 1.000

Data:

library(data.table)
dt <- fread("GROUP   ID  colX   
1       1   0.8    
1       2   0.7    
1       3   0.6    
2       1   1.3    
2       2   1.2    
2       3   1.1    
2       4   1.0    ")

I think there are some better methods to replace function runsum , but I haven't got that and here I just use a custom function to show the basical idea. Any improvement are welcome.

Here is an option using Rcpp with data.table :

library(Rcpp)
cppFunction('NumericVector fun(NumericVector v) {
    int n = v.size();
    NumericVector res(n);

    res[n-1] = v[n-1];
    for(int i=n-2; i>=0; i--) {
        res[i] = v[i] * (1 + res[i+1]);
    }
    return res;
}')
DT[, colY := fun(colX), GROUP]

output:

   GROUP ID colX  colY
1:     1  1  0.8 1.696
2:     1  2  0.7 1.120
3:     1  3  0.6 0.600
4:     2  1  1.3 6.292
5:     2  2  1.2 3.840
6:     2  3  1.1 2.200
7:     2  4  1.0 1.000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM