简体   繁体   English

从auto.arima生成的漂移模型(预测程序包)获得单步预测时出错

[英]Error in obtaining one-step forecasts from auto.arima generated drift model (forecast package)

I'm trying to extract one-step forecasts from an ARIMA model with two external regressors as described on Prof Hyndman's blog here . 我想作为海德门教授的博客描述从ARIMA模型有两个外部回归量提取单步预测这里 I first generate a model using auto.arima, and then apply this model to the full set. 我首先使用auto.arima生成一个模型,然后将该模型应用于全套模型。

The code works as it should for the first firm in my sample. 该代码对我的样本中的第一家公司应该起作用。 The second firm, however, causes an error when extracting the one-step forecasts: 但是,第二家公司在提取单步预测时会导致错误:

Error in `[<-.default`(`*tmp*`, , "drift", value = c(1.00000000000909,  :
subscript out of bounds

The following code contains the time series that causes an error: 以下代码包含导致错误的时间序列:

df <-structure(list(fYearQtr = c(2004.5, 2004.75, 2005, 2005.25, 2005.5, 
2005.75, 2006, 2006.25, 2006.5, 2006.75, 2007, 2007.25, 2007.5, 
2007.75, 2008, 2008.25, 2008.5, 2008.75, 2009, 2009.25, 2009.5, 
2009.75, 2010, 2010.25, 2010.5, 2010.75, 2011, 2011.25, 2011.5, 
2011.75, 2012, 2012.25, 2012.5, 2012.75, 2013, 2013.25, 2013.5, 
2013.75, 2014, 2014.25), Sales = c(2014, 2350, 3490, 3243, 3520, 
3678, 5749, 4359, 4370, 4837, 7115, 5264, 5410, 6217, 9608, 7512, 
7464, 7895, 11880, 9084, 9734, 12207, 15683, 13499, 15700, 20343, 
26741, 24667, 28571, 28270, 46333, 39186, 35023, 35966, 54512, 
43603, 35323, 37472, 57594, 45646), last_SVI = c(17, 23, 25, 
20, 20, 28, 31, 22, 21, 30, 32, 22, 29, 34, 39, 26, 24, 34, 38, 
24, 28, 33, 34, 22, 38, 34, 38, 34, 34, 40, 52, 34, 34, 58, 54, 
31, 32, 53, 48, 30), SD_SVI = c(0.898717034272917, 1.66410058867569, 
2.35883500145783, 2.49615088301353, 1.48064435037847, 2.87562702192596, 
3.45854571482559, 2.26738299389972, 1.05003052458683, 3.67772226053586, 
3.19855736712181, 5.65685424949238, 2.66024868704471, 5.10153320342434, 
3.77236918007361, 2.79880927062444, 2.59437260831385, 3.0697030675746, 
4.66162731573098, 2.33973480855395, 3.43063124938119, 3.71069141390533, 
3.78255103173669, 9.43873633436932, 4.36918111203273, 3.44368615860597, 
4.85032380626706, 3.51188458428425, 2.16617351389673, 3.01066480434182, 
13.8264358990424, 5.36966789786234, 3.3166247903554, 14.2644438718921, 
7.43260316064229, 2.96777564982468, 4.21383557538856, 12.3594664228036, 
6.83880331412088, 2.01913919206257)), .Names = c("fYearQtr", 
"Sales", "xReg1", "xReg2"), row.names = c(NA, -40L), class = "data.frame")

Example data: 示例数据:

head(df)
    fYearQtr Sales xReg1    xReg2
1  2004.50  2014    17 0.898717
2  2004.75  2350    23 1.664101
3  2005.00  3490    25 2.358835
4  2005.25  3243    20 2.496151
5  2005.50  3520    20 1.480644
6  2005.75  3678    28 2.875627

Building a time series object, train/test set and extracting the one-step forecasts: 建立时间序列对象,训练/测试集并提取一步预测:

require(forecast)
TS <- ts(df[,2:4], start = c(2004,3), end = c(2014,2), frequency=4)
TS.TRAIN <- window(TS, end=2011.4)
TS.TEST <- window(TS, start=2011.5)

# Build an arima model
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')])
FCAST <- forecast(MODEL, xreg=TS.TEST[,colnames(TS.TEST) %in% c('xReg1', 'xReg2')])
# Resulting model: ARIMA(0,1,0)(1,0,1)[4] with drift

Now extract 1-step forecasts: 现在提取1步预测:

refit <- Arima(TS[,'Sales'], model=MODEL, xreg=TS[,colnames(TS) %in% c('xReg1', 'xReg2')])
## Error in `[<-.default`(`*tmp*`, , "drift", value = c(1.00000000000909,  : 
# subscript out of bounds

The confusing part: the exact same code works when using the following data frame (different firm): 令人困惑的部分:使用以下数据框(不同的公司)时,完全相同的代码可以工作:

#########################################
# Other example: works just fine?
df_noissues <- structure(list(fQtrYear = c(2004.5, 2004.75, 2005, 2005.25, 2005.5, 
  2005.75, 2006, 2006.25, 2006.5, 2006.75, 2007, 2007.25, 2007.5, 
  2007.75, 2008, 2008.25, 2008.5, 2008.75, 2009, 2009.25, 2009.5, 
  2009.75, 2010, 2010.25, 2010.5, 2010.75, 2011, 2011.25, 2011.5, 
  2011.75, 2012, 2012.25, 2012.5, 2012.75, 2013, 2013.25, 2013.5, 
  2013.75, 2014, 2014.25), Sales = c(5818, 5979, 6221, 6410, 6401, 
 6536, 7111, 7797, 7631, 7840, 7908, 8066, 7387, 7387, 6998, 7245, 
 6970, 5688, 4147, 4244, 4615, 5433, 4887, 5187, 5287, 5652, 5958, 
 6585, 6419, 5989, 6006, 5963, 5833, 5898, 5833, 5849, 5765, 5585, 
 5454, 5836), mean_SVI = c(61.1666666666667, 47.9166666666667, 
 48.5833333333333, 51.4166666666667, 56, 51.8461538461538, 50.1666666666667, 
 60.75, 53.1538461538462, 48.9230769230769, 53, 53.6923076923077, 
 55.8461538461538, 46.3333333333333, 51.25, 54.1666666666667, 
 52.4166666666667, 50.4166666666667, 54.4166666666667, 49.3333333333333, 
 49.1666666666667, 39.5833333333333, 41.8333333333333, 43.9166666666667, 
 39.8333333333333, 37.1666666666667, 45.25, 45.9166666666667, 
 45.8333333333333, 39.7692307692308, 52.8461538461538, 60.6153846153846, 
 44.0769230769231, 37.75, 47.5, 45.1666666666667, 42.1666666666667, 
 39.25, 47.25, 47.4166666666667), SD_SVI = c(9.29157324317757, 
 11.0737883255365, 8.37157890324957, 6.08213977498251, 7.80442764775809, 
 9.09987320283598, 6.16195561244131, 11.2583302491977, 10.4390784542678, 
 8.38114489455884, 9.69535971483266, 11.4118696641159, 6.84161474417351, 
 8.96795642408249, 3.22278817739603, 6.23528570947538, 4.73782330790941, 
 9.3269729410149, 16.1777531496094, 10.9903538972992, 9.64679252708412, 
 11.1147595020261, 11.1586357371836, 7.22946412365063, 7.99810583636507, 
 6.89971453076579, 7.97866473221497, 3.89541299790439, 6.24984848301189, 
 7.5294294400245, 17.0775005677361, 12.6855459844296, 6.00640683578153, 
 6.77059148752228, 6.98700091728789, 6.97832140969228, 3.90415474109624, 
 4.39265916563698, 3.64629326103298, 5.08935311719625)), .Names = c("fQtrYear", 
"Sales", "xReg1", "xReg2"), row.names = c(NA, -40L), class = "data.frame")

Example data: 示例数据:

head(df_noissues)
  fQtrYear Sales    xReg1     xReg2
1  2004.50  5818 61.16667  9.291573
2  2004.75  5979 47.91667 11.073788
3  2005.00  6221 48.58333  8.371579
4  2005.25  6410 51.41667  6.082140
5  2005.50  6401 56.00000  7.804428
6  2005.75  6536 51.84615  9.099873    

Running the same code to construct a test/training set & ARIMA model: 运行相同的代码以构建测试/训练集和ARIMA模型:

TS <- ts(df_noissues[,2:4], start = c(2004,3), end = c(2014,2), frequency=4)
TS.TRAIN <- window(TS, end=2011.4)
TS.TEST <- window(TS, start=2011.5)

# Build an arima model
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')])
FCAST <- forecast(MODEL, xreg=TS.TEST[,colnames(TS.TEST) %in% c('xReg1', 'xReg2')])

Extracting the 1-step forecasts: no error. 提取1步预测:没有错误。

refit <- Arima(TS[,'Sales'], model=MODEL, xreg=TS[,colnames(TS) %in% c('xReg1', 'xReg2')])
## ARIMA(2,0,0)(0,1,0)[4]  

Other than a difference in the model generated (with / without drift), I can't really seem to grasp what might be causing this. 除了生成的模型存在差异(带有/不带有漂移)之外,我似乎无法真正理解造成这种情况的原因。 Running auto.arima with allowdrift=FALSE indeed seems to resolve the issue. 使用allowdrift = FALSE运行auto.arima确实可以解决该问题。

The problem arises due to the column names for the xreg argument. 由于xreg参数的列名而引起问题。 In the first case, there is a drift term which adds a column to xreg , and that triggers a change in the names of the columns. 在第一种情况下,有一个漂移项将一列添加到xreg ,并触发列名的更改。 You can see this when you look at the models. 当您查看模型时,可以看到这一点。

> MODEL
Series: TS.TRAIN[, "Sales"] 
ARIMA(0,1,0)(1,0,1)[4] with drift         

Coefficients:
        sar1    sma1     drift  xreg.xReg1  xreg.xReg2
      0.8135  0.6554  1475.542     38.1461     84.5589
s.e.  0.1314  0.5816  1071.337     45.7297     74.2390

compared to 相比

> MODEL
Series: TS.TRAIN[, "Sales"] 
ARIMA(2,0,0)(0,1,0)[4]                    

Coefficients:
         ar1      ar2     xReg1    xReg2
      1.3394  -0.5685  -14.4072   4.3869
s.e.  0.1766   0.1783   16.3416  24.3968

I will add it to the bug list and see if I can figure out a solution. 我将其添加到Bug列表中,看看是否可以找到解决方案。

As a work-around, you could refit the model with drift as follows: 解决方法是,可以使用以下方法重新调整模型的漂移:

TS <- ts(df[,2:4], start = c(2004,3), end = c(2014,2), frequency=4)
TS.TRAIN <- window(TS, end=2011.4)
z <- TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')]
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=z)
MODEL2 <- Arima(TS.TRAIN[,'Sales'], order=MODEL$arma[c(1,6,2)],
                seasonal=MODEL$arma[c(3,7,4)], xreg=cbind(1:nrow(z),z))
z <- TS[,colnames(TS) %in% c('xReg1', 'xReg2')]
refit <- Arima(TS[,'Sales'], model=MODEL2, xreg=cbind(1:nrow(z),z))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM