简体   繁体   English

如何矢量化 function 取决于之前在 R 中的计算?

[英]How to vectorize a function that depends on a previous calculation in R?

I have a random walk with some drift.我有一些漂移的随机行走。 My goal is to create a function that adds a column to this data.table labeling the "zone" its in based on its cumulative % gain and % drawdown.我的目标是创建一个 function,向此data.table添加一列,根据其累计收益百分比和缩减百分比标记其所在的“区域”。

library(data.table)

set.seed(1)
# generate random returns with drift
df <- data.table(
    "date" = 1:50,
    "ret" = rnorm(50, mean = .002, sd = .01)
)
# calculate the value of the random-walk over-time
df[, val := cumprod(1 + ret)]
df[, draw_down := val / cummax(val) - 1]

The first zone occurs in the first row and goes up until either a 5% cumulative gain or 2% drawdown occurs.第一个区域出现在第一行并上升,直到出现5% cumulative gain2% drawdown

The second zone starts one row after the first zone ends, and continues until the same happens again, a 5% cumulative gain or 2% drawdown第二个区域在第一个区域结束后开始一行,一直持续到同样的情况再次发生, 5% cumulative gain2% drawdown

This repeats until neither of those things occur, in which case the zone continues to the last row.重复此操作,直到这些事情都没有发生,在这种情况下,该区域将继续到最后一行。

Here is a reproducible example:这是一个可重现的例子:

# start with the first row and zone of 1
idx <- 1
count <- 1
res <- data.table()
while (idx <= nrow(df)) {

    # grab the start of the zone and all future rows
    tmp <- df[idx:.N]
    # calculate the necessary things
    tmp[, val := cumprod(1 + ret)]
    tmp[, draw_down := val / cummax(val) - 1]

    # find out if we crossed our drawdown threshold
    loss_idx <- which(
        tmp$draw_down == min(tmp$draw_down[tmp$draw_down <= -.02])
    )
    # find out if we crossed gain threshold
    gain_idx <- which(tmp$val == min(tmp$val[tmp$val >= 1.05]))
    # if we have no thresholds, label the rest of the zones
    # and exit
    if (length(loss_idx) == 0 & length(gain_idx) == 0) {
        tmp[, zone := count]
        res <- rbind(res, tmp)
        break
    }
    # mark the zone
    tmp[1:min(gain_idx, loss_idx), zone := count]
    # increment our index
    idx <- tmp[min(gain_idx, loss_idx)]$date + 1
    print(idx)
    # increment our zone
    count <- count + 1
    res <- rbind(res, tmp[!is.na(zone)])
}

I have tried getting the indices of where these zone points would occur.我已经尝试获取这些区域点出现位置的索引。 But then I run into the problem of needing to recalculate the val and drawdown based on the last zone's index.但是后来我遇到了需要根据最后一个区域的索引重新计算valdrawdown的问题。 I cannot figure out a way to vectorize that.我想不出一种方法来对其进行矢量化。 Maybe using a roll function would be effective here?也许在这里使用roll function 会有效?

The problem boils down to knowing the draw-down by zone, but needing the previous zone in order to calculate the draw-down.问题归结为知道按区域划分的回撤,但需要前一个区域才能计算回撤。 Similarly with the cumulative return.与累积回报类似。 Is it possible to vectorize this function if it depends on the previous value?如果这个 function 取决于之前的值,是否可以对其进行矢量化?

Any help in any direction would be greatly appreciated in trying to achieved the desired output below.在尝试实现下面所需的 output 时,任何方向的任何帮助都将不胜感激。

the desired output:所需的 output:

> res
date    ret val draw_down   zone
<int>   <dbl>   <dbl>   <dbl>   <dbl>
1   -0.0042645381   0.9957355   0.0000000000    1
2   0.0038364332    0.9995555   0.0000000000    1
3   -0.0063562861   0.9932021   -0.0063562861   1
4   0.0179528080    1.0110328   0.0000000000    1
5   0.0052950777    1.0163863   0.0000000000    1
6   -0.0062046838   1.0100800   -0.0062046838   1
7   0.0068742905    1.0170236   0.0000000000    1
8   0.0093832471    1.0265665   0.0000000000    1
9   0.0077578135    1.0345305   0.0000000000    1
10  -0.0010538839   1.0334402   -0.0010538839   1
11  0.0171178117    1.0511304   0.0000000000    1
12  0.0058984324    1.0058984   0.0000000000    2
13  -0.0042124058   1.0016612   -0.0042124058   2
14  -0.0201469989   0.9814807   -0.0242745373   2
15  0.0132493092    1.0132493   0.0000000000    3
16  0.0015506639    1.0148205   0.0000000000    3
17  0.0018380974    1.0166859   0.0000000000    3
18  0.0114383621    1.0283151   0.0000000000    3
19  0.0102122120    1.0388164   0.0000000000    3
20  0.0079390132    1.0470636   0.0000000000    3
21  0.0111897737    1.0587800   0.0000000000    3
22  0.0098213630    1.0691787   0.0000000000    3
23  0.0027456498    1.0721143   0.0000000000    3
24  -0.0178935170   1.0529304   -0.0178935170   3
25  0.0081982575    1.0615626   -0.0098419551   3
26  0.0014387126    1.0630899   -0.0084174023   3
27  0.0004420449    1.0635598   -0.0079790782   3
28  -0.0127075238   1.0500446   -0.0205852077   3
29  -0.0027815006   0.9972185   0.0000000000    4
30  0.0061794156    1.0033807   0.0000000000    4
31  0.0155867955    1.0190202   0.0000000000    4
32  0.0009721227    1.0200108   0.0000000000    4
33  0.0058767161    1.0260051   0.0000000000    4
34  0.0014619496    1.0275051   0.0000000000    4
35  -0.0117705956   1.0154108   -0.0117705956   4
36  -0.0021499456   1.0132277   -0.0138952351   4
37  -0.0019428995   1.0112591   -0.0158111376   4
38  0.0014068660    1.0126818   -0.0144265157   4
39  0.0130002537    1.0258469   -0.0016138103   4
40  0.0096317575    1.0357276   0.0000000000    4
41  0.0003547640    1.0360951   0.0000000000    4
42  -0.0005336168   1.0355422   -0.0005336168   4
43  0.0089696338    1.0448306   0.0000000000    4
44  0.0075666320    1.0527365   0.0000000000    4
45  -0.0048875569   0.9951124   0.0000000000    5
46  -0.0050749516   0.9900623   -0.0050749516   5
47  0.0056458196    0.9956520   0.0000000000    5
48  0.0096853292    1.0052952   0.0000000000    5
49  0.0008765379    1.0061764   0.0000000000    5
50  0.0108110773    1.0170543   0.0000000000    5

I don't think a rolling calculation is the right way to go: typically they have fixed windows, whereas this is a bit more dynamic.我不认为滚动计算是 go 的正确方法:通常他们已经修复了 windows,而这有点动态。 Similarly, a cumulative operation (eg, cumsum ) won't work for similar reasons.类似地,累积操作(例如cumsum )出于类似原因将不起作用。 (That's not to say that I can't warp a zoo::rollapply approach to do this, but I think it'd be much less efficient than this recommended approach.) (这并不是说我不能扭曲zoo::rollapply方法来做到这一点,但我认为它比推荐的方法效率低得多。)

Here's a simple while loop that appears to provide the zone you're asking for:这是一个简单的while循环,似乎提供了您要求的zone

breaks <- integer(0)
rn <- 1L
while (rn <= nrow(df)) {
  theserows <- seq(rn, nrow(df))
  ratios <- df$val[theserows] / df$val[theserows][1]
  upordown <- which(ratios >= 1.05 | ratios <= 0.98)
  if (!length(upordown)) break
  breaks <- c(breaks, upordown[1] + rn)
  rn <- rn + upordown[1]
}
df[, zone := cumsum(seq_len(.N) %in% breaks)]
#      date           ret       val     draw_down  zone
#     <int>         <num>     <num>         <num> <int>
#  1:     1 -0.0042645381 0.9957355  0.0000000000     0
#  2:     2  0.0038364332 0.9995555  0.0000000000     0
#  3:     3 -0.0063562861 0.9932021 -0.0063562861     0
#  4:     4  0.0179528080 1.0110328  0.0000000000     0
#  5:     5  0.0052950777 1.0163863  0.0000000000     0
#  6:     6 -0.0062046838 1.0100800 -0.0062046838     0
#  7:     7  0.0068742905 1.0170236  0.0000000000     0
#  8:     8  0.0093832471 1.0265665  0.0000000000     0
#  9:     9  0.0077578135 1.0345305  0.0000000000     0
# 10:    10 -0.0010538839 1.0334402 -0.0010538839     0
# 11:    11  0.0171178117 1.0511304  0.0000000000     0
# 12:    12  0.0058984324 1.0573304  0.0000000000     1
# 13:    13 -0.0042124058 1.0528765 -0.0042124058     1
# 14:    14 -0.0201469989 1.0316642 -0.0242745373     1
# 15:    15  0.0132493092 1.0453331 -0.0113468490     2
# 16:    16  0.0015506639 1.0469540 -0.0098137803     2
# 17:    17  0.0018380974 1.0488784 -0.0079937216     2
# 18:    18  0.0114383621 1.0608759  0.0000000000     2
# 19:    19  0.0102122120 1.0717098  0.0000000000     2
# 20:    20  0.0079390132 1.0802181  0.0000000000     2
# 21:    21  0.0111897737 1.0923055  0.0000000000     2
# 22:    22  0.0098213630 1.1030334  0.0000000000     2
# 23:    23  0.0027456498 1.1060620  0.0000000000     3
# 24:    24 -0.0178935170 1.0862706 -0.0178935170     3
# 25:    25  0.0081982575 1.0951762 -0.0098419551     3
# 26:    26  0.0014387126 1.0967518 -0.0084174023     3
# 27:    27  0.0004420449 1.0972366 -0.0079790782     3
# 28:    28 -0.0127075238 1.0832934 -0.0205852077     3
# 29:    29 -0.0027815006 1.0802803 -0.0233094505     4
# 30:    30  0.0061794156 1.0869558 -0.0172740737     4
# 31:    31  0.0155867955 1.1038979 -0.0019565256     4
# 32:    32  0.0009721227 1.1049710 -0.0009863049     4
# 33:    33  0.0058767161 1.1114646  0.0000000000     4
# 34:    34  0.0014619496 1.1130896  0.0000000000     4
# 35:    35 -0.0117705956 1.0999878 -0.0117705956     4
# 36:    36 -0.0021499456 1.0976229 -0.0138952351     4
# 37:    37 -0.0019428995 1.0954903 -0.0158111376     4
# 38:    38  0.0014068660 1.0970316 -0.0144265157     4
# 39:    39  0.0130002537 1.1112932 -0.0016138103     4
# 40:    40  0.0096317575 1.1219969  0.0000000000     4
# 41:    41  0.0003547640 1.1223950  0.0000000000     4
# 42:    42 -0.0005336168 1.1217961 -0.0005336168     4
# 43:    43  0.0089696338 1.1318582  0.0000000000     4
# 44:    44  0.0075666320 1.1404225  0.0000000000     4
# 45:    45 -0.0048875569 1.1348486 -0.0048875569     5
# 46:    46 -0.0050749516 1.1290893 -0.0099377044     5
# 47:    47  0.0056458196 1.1354640 -0.0043479913     5
# 48:    48  0.0096853292 1.1464613  0.0000000000     5
# 49:    49  0.0008765379 1.1474662  0.0000000000     5
# 50:    50  0.0108110773 1.1598716  0.0000000000     5
#      date           ret       val     draw_down  zone

And a simple function to do the same:和一个简单的 function 做同样的事情:

func <- function(x, up = 1.05, down = 0.98) {
  breaks <- integer(0)
  if (!length(x)) return(breaks)
  ind <- 1L
  while (ind <= length(x)) {
    theseind <- seq(ind, length(x))
    ratios <- x[theseind] / x[theseind][1]
    upordown <- which(ratios >= up | ratios <= down)
    if (!length(upordown)) break
    breaks <- c(breaks, upordown[1] + ind)
    ind <- ind + upordown[1]
  }
  return(cumsum(seq_along(x) %in% breaks))
}
df[, zone := func(val, 1.05, 0.98) ]

Assuming that you are exploring vectorization to speed up the calculations, here is another option to speed up the calculations using Rccp :假设您正在探索矢量化以加速计算,这是使用Rccp加速计算的另一种选择:

library(Rcpp)
cppFunction("IntegerVector zoning(NumericVector idx) {
    int zone = 1, n = idx.size();
    IntegerVector res = IntegerVector(n);
    double x0 = idx[0];


    for (int i = 1; i < n; i++) {
        res[i] = zone;
        if (idx[i]/x0 < 0.98 || idx[i]/x0 > 1.05) {
            if (i+1 < n) {
                x0 = idx[i+1];
            }
            zone++;
        }
    }

    return res;
}")

df[, zone := zoning(c(1, val))[-1L]]

output: output:

    date           ret       val zone
 1:    1 -0.0042645381 0.9957355    1
 2:    2  0.0038364332 0.9995555    1
 3:    3 -0.0063562861 0.9932021    1
 4:    4  0.0179528080 1.0110328    1
 5:    5  0.0052950777 1.0163863    1
 6:    6 -0.0062046838 1.0100800    1
 7:    7  0.0068742905 1.0170236    1
 8:    8  0.0093832471 1.0265665    1
 9:    9  0.0077578135 1.0345305    1
10:   10 -0.0010538839 1.0334402    1
11:   11  0.0171178117 1.0511304    1
12:   12  0.0058984324 1.0573304    2
13:   13 -0.0042124058 1.0528765    2
14:   14 -0.0201469989 1.0316642    2
15:   15  0.0132493092 1.0453331    3
16:   16  0.0015506639 1.0469540    3
17:   17  0.0018380974 1.0488784    3
18:   18  0.0114383621 1.0608759    3
19:   19  0.0102122120 1.0717098    3
20:   20  0.0079390132 1.0802181    3
21:   21  0.0111897737 1.0923055    3
22:   22  0.0098213630 1.1030334    3
23:   23  0.0027456498 1.1060620    4
24:   24 -0.0178935170 1.0862706    4
25:   25  0.0081982575 1.0951762    4
26:   26  0.0014387126 1.0967518    4
27:   27  0.0004420449 1.0972366    4
28:   28 -0.0127075238 1.0832934    4
29:   29 -0.0027815006 1.0802803    5
30:   30  0.0061794156 1.0869558    5
31:   31  0.0155867955 1.1038979    5
32:   32  0.0009721227 1.1049710    5
33:   33  0.0058767161 1.1114646    5
34:   34  0.0014619496 1.1130896    5
35:   35 -0.0117705956 1.0999878    5
36:   36 -0.0021499456 1.0976229    5
37:   37 -0.0019428995 1.0954903    5
38:   38  0.0014068660 1.0970316    5
39:   39  0.0130002537 1.1112932    5
40:   40  0.0096317575 1.1219969    5
41:   41  0.0003547640 1.1223950    5
42:   42 -0.0005336168 1.1217961    5
43:   43  0.0089696338 1.1318582    5
44:   44  0.0075666320 1.1404225    5
45:   45 -0.0048875569 1.1348486    6
46:   46 -0.0050749516 1.1290893    6
47:   47  0.0056458196 1.1354640    6
48:   48  0.0096853292 1.1464613    6
49:   49  0.0008765379 1.1474662    6
50:   50  0.0108110773 1.1598716    6
    date           ret       val zone

Courtesy of https://rdrr.io/snippets/感谢https://rdrr.io/snippets/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM