简体   繁体   中英

In R - Calculate values in a vector using percentage returns

I have a data.table consisting of one row with a value and a list of percentage returns. I want to convert this information into values by sequentially using the percent returns. Below is an example. It works fine but I want to speed it up and make it more efficient. In the example, I show only four return periods...in practice, I have several hundreds. Besides, I need to do this conversion several tens of thousands times in parallel. So, any speedup would help. Is there any R function/package that can do something like this more efficiently? Thanks for your time!

library(data.table)
library(tidyr)
a <- 2
b <- 4
x <- data.table(awq = c(0.1), R1 = c(0.15), R2 = c(-0.05), R3 = c(0.70), R4 = c(-0.1))
print(x)
    awq   R1    R2  R3   R4
 1: 0.1 0.15 -0.05 0.7 -0.1
tmp1 <- as.data.table(tidyr::gather(x, period, return, -awq, factor_key=F))
setnames(tmp1, old = c("return"), new = c("ret"))
tmp1[period == "R1", v := awq*ret + awq]
for(i in 2:4) {
  tmp1[i, v := tmp1[i, ret] * abs(tmp1[(i-1), v]) + tmp1[(i-1), v]]
}
tmp1[v < 0, v := 0]
tmp1 <- tmp1[, .(period, v)]
tmp1[, a := a]
tmp1[, b := b]
tmp1 <- as.data.table(pivot_wider(tmp1, names_from = period, values_from = c(v)))
print(tmp1)
   a b    R1      R2       R3        R4
1: 2 4 0.115 0.10925 0.185725 0.1671525

@akrun's comment on cumprod is spot-on:

myfunc <- function(awq, rest) as.data.table(awq * t(apply(1 + rest, 1, cumprod)))

### Another row
x <- data.table(awq = c(0.1,0.2), R1 = c(0.15), R2 = c(-0.05), R3 = c(0.70), R4 = c(-0.1))

x[, myfunc(awq, .SD[,R1:R4])]
#         R1      R2       R3        R4
# [1,] 0.115 0.10925 0.185725 0.1671525
# [2,] 0.230 0.21850 0.371450 0.3343050

You can augment the original frame with a couple techniques:

cbind(x, x[, myfunc(awq, .SD[,R1:R4])])
#    awq   R1    R2  R3   R4    R1      R2       R3        R4
# 1: 0.1 0.15 -0.05 0.7 -0.1 0.115 0.10925 0.185725 0.1671525
# 2: 0.2 0.15 -0.05 0.7 -0.1 0.230 0.21850 0.371450 0.3343050

x[, c("S1","S2","S3","S4") := myfunc(awq, .SD[,R1:R4]) ][]
#    awq   R1    R2  R3   R4    S1      S2       S3        S4
# 1: 0.1 0.15 -0.05 0.7 -0.1 0.115 0.10925 0.185725 0.1671525
# 2: 0.2 0.15 -0.05 0.7 -0.1 0.230 0.21850 0.371450 0.3343050

The first has the disadvantage of having duplicate column names. The latter has the disadvantage of needing to know the number of columns involved a priori.


The myfunc is mostly veneer, this can be simplified further:

x[, awq*t(apply(1+.SD[,R1:R4],1,cumprod)) ]
#         R1      R2       R3        R4
# [1,] 0.115 0.10925 0.185725 0.1671525
# [2,] 0.230 0.21850 0.371450 0.3343050

with all of the "augmentation" steps I mentioned previously.

In addition, we could also use rowCumprods from matrixStats

library(matrixStats)
library(data.table)
x[, names(x)[-1] := as.data.table(awq * rowCumprods( as.matrix(.SD+ 1))), 
      .SDcols = patterns("^R")][]
#  awq    R1      R2       R3        R4
#1: 0.1 0.115 0.10925 0.185725 0.1671525

In addition, in the original script showed by OP, gather/pivot_wider could be replaced by melt/dcast from data.table

tmp1 <- melt(x, id.var = 'awq', variable.name = 'period', value.name = 'ret')
tmp1[period == "R1", v := awq*ret + awq]

and the for loop can be replaced with Reduce

tmp1[, v := Reduce(`*`, ret + 1, accumulate = TRUE) * awq]
dcast(tmp1, awq ~ period, value.var = 'v')
#   awq    R1      R2       R3        R4
#1: 0.1 0.115 0.10925 0.185725 0.1671525

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM