简体   繁体   English

R 中的滚动 AR 回归

[英]Rolling AR regressions in R

I am managing a pure monthly time seires data with 348 observations.我正在管理具有 348 个观测值的纯月度时间序列数据。

Here is the reproducible sample data:这是可重现的样本数据:

library(dplyr)
set.seed(123)

Y <- cumsum(rnorm(48))
date <- as.Date(c("2012-01-01", "2012-02-01", "2012-03-01", "2012-04-01", 
                  "2012-05-01","2012-06-01", "2012-07-01", "2012-08-01",
                  "2012-09-01","2012-10-01","2012-11-01", "2012-12-01",
                  "2013-01-01", "2013-02-01","2013-03-01", "2013-04-01", 
                  "2013-05-01","2013-06-01", "2013-07-01", "2013-08-01",
                  "2013-09-01","2013-10-01","2013-11-01", "2013-12-01",
                  "2014-01-01", "2014-02-01","2014-03-01", "2014-04-01", 
                  "2014-05-01","2014-06-01", "2014-07-01", "2014-08-01",
                  "2014-09-01","2014-10-01","2014-11-01", "2014-12-01",
                  "2015-01-01", "2015-02-01", "2015-03-01", "2015-04-01", 
                  "2015-05-01","2015-06-01",  "2015-07-01", "2015-08-01",
                  "2015-09-01","2015-10-01","2015-11-01", "2015-12-01"))

data<-data.frame(date,Y)

I am replicating a paper and calculating "shock".我正在复制一篇论文并计算“震惊”。 The quoted procedure is as follows:引用的程序如下:

"Shocks in each of these series are calculated by an AR(2) model over a rolling window of 10 months that ends in month n.The shock in month n+1 denoted dYn+1 is the difference between the actual value of the series and its predicted value using the slope coefficients estimated over the preceding 10 months. Thus, our method is forward-looking, providing out-of-sample prediction errors." “这些系列中的每一个中的冲击都是通过 AR(2) 模型在以第 n 个月结束的 10 个月滚动窗口内计算的。第 n+1 个月的冲击用 dYn+1 表示是该系列实际值之间的差异及其使用前 10 个月估计的斜率系数的预测值。因此,我们的方法是前瞻性的,提供样本外预测误差。”

The AR(2) model with a trend term according to original authors is as follows:根据原作者的说法,带有趋势项的 AR(2) 模型如下:

Yt = a0 + a1*Yt-1 + a2*Yt-2 + a3*Tt+ residualt
where Tt is the serial number of the observation, to account for a time trend in these series.

If my objective is to calculate the mean in 11st using previous 10 obs, I can simply call the following:如果我的目标是使用前 10 个 obs 计算第 11 个的平均值,我可以简单地调用以下代码:

Mean= slide_index_dbl(Y, date, mean, .before = months(10), .after = months(-1), .complete = T)

However, in this case, the objective is to run a AR model using all previous 10 obs and use this estimated model to predict 11st one, the final ouput is the actual value in 11st minus the predicted one.但是,在这种情况下,目标是使用所有之前的 10 个 obs 运行 AR 模型,并使用此估计模型预测第 11 个,最终输出是第 11 个的实际值减去预测值。 Put it simply, I need to construct a function to achieve this goal instead of using "mean" function in previous example.简单来说,我需要构造一个函数来实现这个目标,而不是在前面的例子中使用“mean”函数。

Once I finish this function(let's call it AR_2), we can call it inside the slider.一旦我完成这个函数(我们称之为 AR_2),我们就可以在滑块内部调用它。

library(slider)

data1<-data%>%
  mutate(Shock= slide_index_dbl(Y, date, AR_2, .before = months(10), .after = months(-1), .complete = T))

   Date       Y     N  Predict    Shock
2012-01-01   0.15   1    0.2     -0.005
2012-02-01   0.4    2    0.33      0.07
2012-03-01   0.39   3    0.44     -0.05
... 
2012-10-01   1.85   10   2.1      -0.25
2012-11-01   1.7    11   1.5       0.2   
2012-12-01   3.46   12   4.1      -0.65

Let me illustrate my question using above sample data I make up.让我使用我制作的上述示例数据来说明我的问题。 The final output is Shock, which is the difference between Y(actual data) and Predicted Y. With that being said, the question is how to get Predicted Y. Before we predict Y, we need to first train a AR model by feeding previous 10 months data.最终输出是 Shock,它是 Y(实际数据)和 Predicted Y 之间的差值。话虽如此,问题是如何得到 Predicted Y。在我们预测 Y 之前,我们需要先通过输入先前的数据来训练 AR 模型10 个月的数据。 Once you get this model, you can predict 11st obs with this model and that would be Predicted Y. The final tast is to calculate the difference between it with actual Y, which is called shock.一旦你得到这个模型,你就可以用这个模型预测第 11 个 obs,那将是预测的 Y。最后的尝试是计算它与实际 Y 之间的差异,这称为冲击。

The numerical example is as follows:in order to get output 0.074 for 2012-11-01, I need to train a AR(2) model using all previous 10 months data from 2012-01-01 to 2012-10-01, that is from 0.15 to 1.85 for Y and 1 to 10 for N. Once I've trained this model, I use it to predict new value (0.2) in 2012-11-01.数值示例如下:为了获得 2012-11-01 的输出 0.074,我需要使用从 2012-01-01 到 2012-10-01 的所有前 10 个月的数据训练一个 AR(2) 模型,即Y 为 0.15 到 1.85,N 为 1 到 10。一旦我训练了这个模型,我就用它来预测 2012 年 11 月 1 日的新值 (0.2)。 The final output "Shock" is the difference between actual obs in 2012-11-01 with the predicted one (0.15-0.2=0.05).最终输出“Shock”是 2012-11-01 的实际观测值与预测观测值之间的差值 (0.15-0.2=0.05)。

Similarly, in order to get output 0.08 for 2012-12-01, I need to train a AR(2) model using all previous 10 months data from 2012-02-01 to 2012-11-01, that is from 0.4 to 1.7 for Y and 2 to 11 for N. Once I've trained this model, I use it to predict new value (4.1) in 2012-12-01.同样,为了获得 2012-12-01 的输出 0.08,我需要使用从 2012-02-01 到 2012-11-01 的所有前 10 个月的数据训练一个 AR(2) 模型,即从 0.4 到 1.7对于 Y 和 2 到 11,对于 N。一旦我训练了这个模型,我就用它来预测 2012 年 12 月 1 日的新值 (4.1)。 The final output "Shock" is the difference between actual obs in 2012-12-01 with the predicted one (3.46-4.1=-0.65).最终输出“Shock”是 2012-12-01 的实际观测值与预测观测值之间的差值 (3.46-4.1=-0.65)。

I do not know how to write such a function(AR_2) and call it in slide_index_dbl.我不知道如何编写这样的函数(AR_2)并在 slide_index_dbl 中调用它。 Please remember there is a trend term a3*Tt in AR_2, I not sure how to model that.请记住 AR_2 中有一个趋势项 a3*Tt,我不知道如何建模。

The following is what I have tried.以下是我尝试过的。 I use lm to implement a AR moel instead of arima.我使用 lm 来实现 AR moel 而不是 arima。 This is because I do not know how to use arima to predict new value.这是因为我不知道如何使用 arima 来预测新值。 If anyone are familar with arima or Arima function, go use that.如果有人熟悉 arima 或 Arima 功能,请使用它。

Intercept_extract_lm<-function(x){
  N<-rep(1:10)
  model<-lm(x~ lag(x,1)+ lag(x, 2)+N)
  coef(model)["(Intercept)"]
}

Log_1_extract_lm<-function(x){
  N<-rep(1:10)
  model<-lm(x~ lag(x,1)+ lag(x, 2)+N)
  coef(model)["Lag_1"]
}

Log_2_extract_lm<-function(x){
  N<-rep(1:10)
  model<-lm(x~ lag(x,1)+ lag(x, 2)+N)
  coef(model)["Lag_2"]
}


drift_extract_lm<-function(x){
  N<-rep(1:10)
  model<-lm(x~ lag(x,1)+ lag(x, 2)+N)
  coef(model)["N"]
}

data1<-data%>%
  mutate(Lag_1=lag(Y,1),Lag_2=lag(Y,2),N=1:n(),
         a1=slide_index_dbl(Y, Date, Log_1_extract_lm, .before = months(10), .after = months(-1), .complete = T),
         a2=slide_index_dbl(Y, Log_2_extract_lm, .before = months(10), .after = months(-1), .complete = T),
         drift=slide_index_dbl(Y, Date, drift_extract_lm, .before = months(10), .after = months(-1), .complete = T),
         Intercept = slide_index_dbl(Y, Date, Intercept_extract_lm, .before = months(10), .after = months(-1), .complete = T),
         Predict=Intercept+a1*Lag_1+a2*Lag_2+drift,
         Shock=Y-Predict)

I understand that the biggest problem in my code is it can take only one input argument in slider, but I am using four in all the custom functions defined (Y and lag_1 and lag_2 and N).我知道我的代码中最大的问题是它只能在滑块中使用一个输入参数,但我在定义的所有自定义函数中使用了四个(Y 和 lag_1 以及 lag_2 和 N)。 Unlike previous example, we just calculate rolling mean of one single variable, in this case, in order to run such a regression, we need four variables in each rolling window, but slider only has one variable input.与前面的例子不同,我们只计算一个变量的滚动平均值,在这种情况下,为了运行这样的回归,我们需要在每个滚动窗口中有四个变量,但滑块只有一个变量输入。 Even we edit the "extract" function to reduce four variables to two ( lag_1 and lag_2 can be writted as lag(Y,1) and lag(Y,2), we still have Y and N two variables input)即使我们编辑“提取”函数将四个变量减少到两个(lag_1和lag_2可以写成lag(Y,1)和lag(Y,2),我们仍然有Y和N两个变量输入)

As for the trend term, from my understanding, for example, for 2012-11-01, in order to run regression, I need previous 10 minths N, that is 1,2,3....10, that is serial number of obs.至于趋势项,根据我的理解,例如,对于2012-11-01,为了运行回归,我需要前10分钟的N,即1,2,3....10,即序列号的 obs。 For 2012-12-01, it should be also include a N from 1,2,3...10, however, based on my code, it is 2,3,4...11.对于 2012-12-01,它还应该包括来自 1,2,3...10 的 N,但是,根据我的代码,它是 2,3,4...11。 But adding constant of 1 (2,3...11 vs 1,2,...10) doesn't affect regression coefficient, right?但是添加常数 1 (2,3...11 vs 1,2,...10) 不会影响回归系数,对吧? Honestly, I am not sure about trend term in this case, please feel free to change it if you understand orginal authors.老实说,我不确定这种情况下的趋势术语,如果您了解原始作者,请随时更改它。

You may do this.你可以这样做。 Your problem of lagged 2 [A(2) Model] will be catered by order = c(2,0,0) and you don't have to do it explicitly.您的滞后 2 [A(2) 模型] 问题将由order = c(2,0,0) ,您不必明确这样做。 arima() function is used in conjunction with predict() . arima()函数与predict()结合使用。

library(tidyverse)
set.seed(123)
Y <- cumsum(rnorm(48))
date <- as.Date(c("2012-01-01", "2012-02-01", "2012-03-01", "2012-04-01", 
                  "2012-05-01","2012-06-01", "2012-07-01", "2012-08-01",
                  "2012-09-01","2012-10-01","2012-11-01", "2012-12-01",
                  "2013-01-01", "2013-02-01","2013-03-01", "2013-04-01", 
                  "2013-05-01","2013-06-01", "2013-07-01", "2013-08-01",
                  "2013-09-01","2013-10-01","2013-11-01", "2013-12-01",
                  "2014-01-01", "2014-02-01","2014-03-01", "2014-04-01", 
                  "2014-05-01","2014-06-01", "2014-07-01", "2014-08-01",
                  "2014-09-01","2014-10-01","2014-11-01", "2014-12-01",
                  "2015-01-01", "2015-02-01", "2015-03-01", "2015-04-01", 
                  "2015-05-01","2015-06-01",  "2015-07-01", "2015-08-01",
                  "2015-09-01","2015-10-01","2015-11-01", "2015-12-01"))

data <- data.frame(date,Y)

library(lubridate, warn.conflicts = F)
library(slider)

data %>% 
  mutate(pred_roll_10 = slide_index_dbl(Y, .i = date, 
                                        ~ predict(arima(.x, c(2,0,0), method = 'ML'),1)$pred[1], 
                                        .before = months(10),
                                        .after = months(-1),
                                        .complete = T),
         shock = Y - pred_roll_10)
#> Warning in log(s2): NaNs produced

#> Warning in log(s2): NaNs produced

#> Warning in log(s2): NaNs produced

#> Warning in log(s2): NaNs produced
#>          date          Y pred_roll_10        shock
#> 1  2012-01-01 -0.5604756           NA           NA
#> 2  2012-02-01 -0.7906531           NA           NA
#> 3  2012-03-01  0.7680552           NA           NA
#> 4  2012-04-01  0.8385636           NA           NA
#> 5  2012-05-01  0.9678513           NA           NA
#> 6  2012-06-01  2.6829163           NA           NA
#> 7  2012-07-01  3.1438325           NA           NA
#> 8  2012-08-01  1.8787713           NA           NA
#> 9  2012-09-01  1.1919184           NA           NA
#> 10 2012-10-01  0.7462564           NA           NA
#> 11 2012-11-01  1.9703382    0.6951079  1.275230339
#> 12 2012-12-01  2.3301521    2.0524671  0.277684986
#> 13 2013-01-01  2.7309235    1.9333295  0.797594017
#> 14 2013-02-01  2.8416062    2.0486834  0.792922863
#> 15 2013-03-01  2.2857651    1.9959595  0.289805588
#> 16 2013-04-01  4.0726782    1.7514948  2.321183420
#> 17 2013-05-01  4.5705287    3.6469980  0.923530738
#> 18 2013-06-01  2.6039116    4.0585260 -1.454614411
#> 19 2013-07-01  3.3052675    1.6480334  1.657234048
#> 20 2013-08-01  2.8324760    2.9543635 -0.121887466
#> 21 2013-09-01  1.7646523    2.8739943 -1.109341911
#> 22 2013-10-01  1.5466774    2.7845341 -1.237856702
#> 23 2013-11-01  0.5206730    2.5894221 -2.068749114
#> 24 2013-12-01 -0.2082182    1.3652172 -1.573435497
#> 25 2014-01-01 -0.8332575    0.3654956 -1.198753080
#> 26 2014-02-01 -2.5199508   -0.5625776 -1.957373251
#> 27 2014-03-01 -1.6821638   -2.6388755  0.956711703
#> 28 2014-04-01 -1.5287907   -0.7131115 -0.815679154
#> 29 2014-05-01 -2.6669276   -1.2140023 -1.452925253
#> 30 2014-06-01 -1.4131127   -2.6421098  1.228997135
#> 31 2014-07-01 -0.9866485   -1.0140132  0.027364706
#> 32 2014-08-01 -1.2817199   -0.8209904 -0.460729511
#> 33 2014-09-01 -0.3865943   -1.2655968  0.879002504
#> 34 2014-10-01  0.4915392   -1.1977841  1.689323273
#> 35 2014-11-01  1.3131203   -0.4463373  1.759457622
#> 36 2014-12-01  2.0017605    0.9931931  1.008567426
#> 37 2015-01-01  2.5556782    1.7819117  0.773766509
#> 38 2015-02-01  2.4937665    2.3699293  0.123837184
#> 39 2015-03-01  2.1878038    2.1804620  0.007341787
#> 40 2015-04-01  1.8073328    1.5452418  0.262091026
#> 41 2015-05-01  1.1126258    1.2644697 -0.151843849
#> 42 2015-06-01  0.9047086    0.2974574  0.607251157
#> 43 2015-07-01 -0.3606878    0.7761950 -1.136882824
#> 44 2015-08-01  1.8082682   -1.1719233  2.980191448
#> 45 2015-09-01  3.0162302    1.9584189  1.057811310
#> 46 2015-10-01  1.8931216    2.5356767 -0.642555104
#> 47 2015-11-01  1.4902368    1.2115509  0.278685838
#> 48 2015-12-01  1.0235814    1.4319100 -0.408328595

Created on 2021-07-08 by the reprex package (v2.0.0)reprex 包( v2.0.0 ) 于 2021 年 7 月 8 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM