簡體   English   中英

遠期相關R

[英]Forward period correlations R

我不確定這是否符合問題條件,但是我需要幫助以使編碼更高效。 我認為這可以更有效地完成,我的編寫功能實在太糟糕了,也許看到答案將對我有所幫助。

示例:我有時間序列數據,並且想計算指標Y與我的X值(多個X)的遠期變化的相關性。 (dput在最后)。

我的解決方案:

str(data.dt)
Classes ‘data.table’ and 'data.frame':  210 obs. of  3 variables:
 $ id     : chr  "X1" "X1" "X1" "X1" ...
 $ date   : Date, format: "2016-11-18" "2016-11-25" "2016-12-02" "2016-12-09" ...
 $ PX_LAST: num  2.72 2.76 2.86 2.81 2.83 ...
 - attr(*, ".internal.selfref")=<externalptr> 

#separate indicator value
y.dt <- data.dt[id=="Y"]

#add indicator as own column for each X
step1.dt <- y.dt[data.dt, on="date"]
#rename
correl.dt <-  step1.dt[, .(date=date, x_id=i.id, x_value=i.PX_LAST, y_id = id,  y_value=PX_LAST)]
#discard NAs and Y from x_id
correl.dt <- na.omit(correl.dt[x_id != "Y"])
#calculate change for each X
correl.dt[, x.chg := c(rep(NA, 1), diff(x_value, 1)), by=list(x_id)]
#create forward change by leading changes
correl.dt[, fwd.xchg := shift(x.chg, type='lead', 1), by = list(x_id)]

#create multiple Y changes to test correlations
correl.dt[, y.chg1 := c(rep(NA, 1), diff(y_value, 1)), by=list(x_id)]
correl.dt[, y.chg2 := c(rep(NA, 2), diff(y_value, 2)), by=list(x_id)]
correl.dt[, y.chg3 := c(rep(NA, 3), diff(y_value, 3)), by=list(x_id)]
correl.dt[, y.chg4 := c(rep(NA, 4), diff(y_value, 4)), by=list(x_id)]
correl.dt[, y.chg5 := c(rep(NA, 5), diff(y_value, 5)), by=list(x_id)]
correl.dt[, y.chg6 := c(rep(NA, 6), diff(y_value, 6)), by=list(x_id)]

#cbind results together
cbind(correl.dt[, cor(fwd.xchg, y.chg1, method='spearman', use='pairwise'), by=.(x_id)],
      correl.dt[, cor(fwd.xchg, y.chg2, method='spearman', use='pairwise'), by=.(x_id)][,2],
      correl.dt[, cor(fwd.xchg, y.chg3, method='spearman', use='pairwise'), by=.(x_id)][,2],
      correl.dt[, cor(fwd.xchg, y.chg4, method='spearman', use='pairwise'), by=.(x_id)][,2],
      correl.dt[, cor(fwd.xchg, y.chg5, method='spearman', use='pairwise'), by=.(x_id)][,2],
      correl.dt[, cor(fwd.xchg, y.chg6, method='spearman', use='pairwise'), by=.(x_id)][,2])

結果,沒有意義,因為我的子集很小。 另外,我選擇了短期的相關性以適合我的子集。 感謝您的幫助,這是測試正向相關性的最佳方法。 我愛上了datatable,雖然還不是很擅長,但是正在改進。 我有大約100-200個指標要測試。

這是dput:

structure(list(id = c("X1", "X1", "X1", "X1", "X1", "X1", "X1", 
"X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", 
"X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", 
"X1", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", 
"X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", 
"X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X3", "X3", 
"X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", 
"X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", 
"X3", "X3", "X3", "X3", "X3", "X3", "X4", "X4", "X4", "X4", "X4", 
"X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", 
"X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", 
"X4", "X4", "X4", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", 
"X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", 
"X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", 
"X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", 
"X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", 
"X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "Y", "Y", "Y", 
"Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", 
"Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", 
"Y"), date = structure(c(17123L, 17130L, 17137L, 17144L, 17151L, 
17158L, 17165L, 17172L, 17179L, 17186L, 17193L, 17200L, 17207L, 
17214L, 17221L, 17228L, 17235L, 17242L, 17249L, 17256L, 17263L, 
17270L, 17277L, 17284L, 17291L, 17298L, 17305L, 17312L, 17319L, 
17326L, 17123L, 17130L, 17137L, 17144L, 17151L, 17158L, 17165L, 
17172L, 17179L, 17186L, 17193L, 17200L, 17207L, 17214L, 17221L, 
17228L, 17235L, 17242L, 17249L, 17256L, 17263L, 17270L, 17277L, 
17284L, 17291L, 17298L, 17305L, 17312L, 17319L, 17326L, 17123L, 
17130L, 17137L, 17144L, 17151L, 17158L, 17165L, 17172L, 17179L, 
17186L, 17193L, 17200L, 17207L, 17214L, 17221L, 17228L, 17235L, 
17242L, 17249L, 17256L, 17263L, 17270L, 17277L, 17284L, 17291L, 
17298L, 17305L, 17312L, 17319L, 17326L, 17123L, 17130L, 17137L, 
17144L, 17151L, 17158L, 17165L, 17172L, 17179L, 17186L, 17193L, 
17200L, 17207L, 17214L, 17221L, 17228L, 17235L, 17242L, 17249L, 
17256L, 17263L, 17270L, 17277L, 17284L, 17291L, 17298L, 17305L, 
17312L, 17319L, 17326L, 17123L, 17130L, 17137L, 17144L, 17151L, 
17158L, 17165L, 17172L, 17179L, 17186L, 17193L, 17200L, 17207L, 
17214L, 17221L, 17228L, 17235L, 17242L, 17249L, 17256L, 17263L, 
17270L, 17277L, 17284L, 17291L, 17298L, 17305L, 17312L, 17319L, 
17326L, 17123L, 17130L, 17137L, 17144L, 17151L, 17158L, 17165L, 
17172L, 17179L, 17186L, 17193L, 17200L, 17207L, 17214L, 17221L, 
17228L, 17235L, 17242L, 17249L, 17256L, 17263L, 17270L, 17277L, 
17284L, 17291L, 17298L, 17305L, 17312L, 17319L, 17326L, 17123L, 
17130L, 17137L, 17144L, 17151L, 17158L, 17165L, 17172L, 17179L, 
17186L, 17193L, 17200L, 17207L, 17214L, 17221L, 17228L, 17235L, 
17242L, 17249L, 17256L, 17263L, 17270L, 17277L, 17284L, 17291L, 
17298L, 17305L, 17312L, 17319L, 17326L), class = "Date"), PX_LAST = c(2.719, 
2.761, 2.863, 2.815, 2.831, 2.872, 2.765, 2.681, 2.692, 2.783, 
2.779, 2.795, 2.696, 2.803, 2.73, 2.807, 2.977, 2.861, 2.75, 
2.701, 2.551, 2.474, 2.538, 2.575, 2.648, 2.635, 2.475, 2.41, 
2.412, 2.373, 1.579, 1.56, 1.619, 1.73, 1.833, 1.796, 1.721, 
1.731, 1.715, 1.751, 1.782, 1.766, 1.697, 1.711, 1.607, 1.702, 
1.811, 1.761, 1.642, 1.625, 1.596, 1.494, 1.47, 1.547, 1.542, 
1.571, 1.475, 1.445, 1.4, 1.413, 1.455, 1.417, 1.38, 1.453, 1.438, 
1.345, 1.239, 1.383, 1.364, 1.431, 1.471, 1.352, 1.256, 1.211, 
1.078, 1.185, 1.231, 1.244, 1.196, 1.139, 1.075, 1.043, 1.034, 
1.085, 1.117, 1.086, 1.093, 1.012, 1.038, 1.02, 0.272, 0.24, 
0.281, 0.365, 0.314, 0.221, 0.208, 0.298, 0.338, 0.421, 0.462, 
0.412, 0.32, 0.302, 0.186, 0.356, 0.485, 0.435, 0.403, 0.328, 
0.228, 0.187, 0.253, 0.317, 0.418, 0.391, 0.368, 0.331, 0.274, 
0.268, 2.3548, 2.3572, 2.3831, 2.4675, 2.5916, 2.5373, 2.4443, 
2.4193, 2.3964, 2.4668, 2.4843, 2.4648, 2.4073, 2.4147, 2.3117, 
2.478, 2.5745, 2.5005, 2.4123, 2.3874, 2.3822, 2.2374, 2.248, 
2.2802, 2.3487, 2.3257, 2.2346, 2.2465, 2.1591, 2.1538, 0.517, 
0.534, 0.559, 0.611, 0.64, 0.615, 0.556, 0.628, 0.628, 0.699, 
0.749, 0.71, 0.665, 0.678, 0.549, 0.694, 0.774, 0.75, 0.673, 
0.605, 0.548, 0.516, 0.564, 0.587, 0.653, 0.572, 0.518, 0.514, 
0.425, 0.43, 0.8906, 0.895, 0.8999, 0.9062, 0.89, 0.8864, 0.8802, 
0.8839, 0.8964, 0.899, 0.9145, 0.9039, 0.9054, 0.9044, 0.8934, 
0.8978, 0.9041, 0.9048, 0.8979, 0.9023, 0.892, 0.8842, 0.8942, 
0.9107, 0.9121, 0.9163, 0.8944, 0.8965, 0.8995, 0.8965)), row.names = c(NA, 
-210L), class = c("data.table", "data.frame"), .Names = c("id", 
"date", "PX_LAST"), .internal.selfref = <pointer: 0x003c24a0>)

這是我想到的。 因為分配變量的次數較少,所以速度稍快一些,但是性能提升並不那么多。 也許簡單的代碼是最大的優勢

library(dplyr)
lapply(as.list(1:6), 
     function(x) {correl.dt[, cor(fwd.xchg, y_value - shift(y_value, x), 
                                  method='spearman', use='pairwise'), by=.(x_id)][, 2]}) %>% 
do.call(cbind, .)

這是一個基准:

my_code <- function(){
  lapply(as.list(1:6), 
         function(x) {correl.dt[, cor(fwd.xchg, y_value - shift(y_value, x), 
                                      method='spearman', use='pairwise'), by=.(x_id)][, 2]}) %>% 
    do.call(cbind, .)

}

your_code <- function(){
  #create multiple Y changes to test correlations
  correl.dt[, y.chg1 := c(rep(NA, 1), diff(y_value, 1)), by=list(x_id)]
  correl.dt[, y.chg2 := c(rep(NA, 2), diff(y_value, 2)), by=list(x_id)]
  correl.dt[, y.chg3 := c(rep(NA, 3), diff(y_value, 3)), by=list(x_id)]
  correl.dt[, y.chg4 := c(rep(NA, 4), diff(y_value, 4)), by=list(x_id)]
  correl.dt[, y.chg5 := c(rep(NA, 5), diff(y_value, 5)), by=list(x_id)]
  correl.dt[, y.chg6 := c(rep(NA, 6), diff(y_value, 6)), by=list(x_id)]

  #cbind results together
  cbind(correl.dt[, cor(fwd.xchg, y.chg1, method='spearman', use='pairwise'), by=.(x_id)],
        correl.dt[, cor(fwd.xchg, y.chg2, method='spearman', use='pairwise'), by=.(x_id)][,2],
        correl.dt[, cor(fwd.xchg, y.chg3, method='spearman', use='pairwise'), by=.(x_id)][,2],
        correl.dt[, cor(fwd.xchg, y.chg4, method='spearman', use='pairwise'), by=.(x_id)][,2],
        correl.dt[, cor(fwd.xchg, y.chg5, method='spearman', use='pairwise'), by=.(x_id)][,2],
        correl.dt[, cor(fwd.xchg, y.chg6, method='spearman', use='pairwise'), by=.(x_id)][,2])
}

microbenchmark::microbenchmark(my_code(), your_code())
## Unit: milliseconds
##        expr       min        lq     mean    median        uq      max neval
##   my_code()  8.818589  9.160749  9.47846  9.293391  9.605331 13.12389   100
## your_code() 11.068776 11.436789 11.98425 11.600102 11.926878 16.94066   100

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM