Error in obtaining one-step forecasts from auto.arima generated drift model (forecast package)

Question

I'm trying to extract one-step forecasts from an ARIMA model with two external regressors as described on Prof Hyndman's blog here . I first generate a model using auto.arima, and then apply this model to the full set.

The code works as it should for the first firm in my sample. The second firm, however, causes an error when extracting the one-step forecasts:

Error in `[<-.default`(`*tmp*`, , "drift", value = c(1.00000000000909,  :
subscript out of bounds

The following code contains the time series that causes an error:

df <-structure(list(fYearQtr = c(2004.5, 2004.75, 2005, 2005.25, 2005.5, 
2005.75, 2006, 2006.25, 2006.5, 2006.75, 2007, 2007.25, 2007.5, 
2007.75, 2008, 2008.25, 2008.5, 2008.75, 2009, 2009.25, 2009.5, 
2009.75, 2010, 2010.25, 2010.5, 2010.75, 2011, 2011.25, 2011.5, 
2011.75, 2012, 2012.25, 2012.5, 2012.75, 2013, 2013.25, 2013.5, 
2013.75, 2014, 2014.25), Sales = c(2014, 2350, 3490, 3243, 3520, 
3678, 5749, 4359, 4370, 4837, 7115, 5264, 5410, 6217, 9608, 7512, 
7464, 7895, 11880, 9084, 9734, 12207, 15683, 13499, 15700, 20343, 
26741, 24667, 28571, 28270, 46333, 39186, 35023, 35966, 54512, 
43603, 35323, 37472, 57594, 45646), last_SVI = c(17, 23, 25, 
20, 20, 28, 31, 22, 21, 30, 32, 22, 29, 34, 39, 26, 24, 34, 38, 
24, 28, 33, 34, 22, 38, 34, 38, 34, 34, 40, 52, 34, 34, 58, 54, 
31, 32, 53, 48, 30), SD_SVI = c(0.898717034272917, 1.66410058867569, 
2.35883500145783, 2.49615088301353, 1.48064435037847, 2.87562702192596, 
3.45854571482559, 2.26738299389972, 1.05003052458683, 3.67772226053586, 
3.19855736712181, 5.65685424949238, 2.66024868704471, 5.10153320342434, 
3.77236918007361, 2.79880927062444, 2.59437260831385, 3.0697030675746, 
4.66162731573098, 2.33973480855395, 3.43063124938119, 3.71069141390533, 
3.78255103173669, 9.43873633436932, 4.36918111203273, 3.44368615860597, 
4.85032380626706, 3.51188458428425, 2.16617351389673, 3.01066480434182, 
13.8264358990424, 5.36966789786234, 3.3166247903554, 14.2644438718921, 
7.43260316064229, 2.96777564982468, 4.21383557538856, 12.3594664228036, 
6.83880331412088, 2.01913919206257)), .Names = c("fYearQtr", 
"Sales", "xReg1", "xReg2"), row.names = c(NA, -40L), class = "data.frame")

Example data:

head(df)
    fYearQtr Sales xReg1    xReg2
1  2004.50  2014    17 0.898717
2  2004.75  2350    23 1.664101
3  2005.00  3490    25 2.358835
4  2005.25  3243    20 2.496151
5  2005.50  3520    20 1.480644
6  2005.75  3678    28 2.875627

Building a time series object, train/test set and extracting the one-step forecasts:

require(forecast)
TS <- ts(df[,2:4], start = c(2004,3), end = c(2014,2), frequency=4)
TS.TRAIN <- window(TS, end=2011.4)
TS.TEST <- window(TS, start=2011.5)

# Build an arima model
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')])
FCAST <- forecast(MODEL, xreg=TS.TEST[,colnames(TS.TEST) %in% c('xReg1', 'xReg2')])
# Resulting model: ARIMA(0,1,0)(1,0,1)[4] with drift

Now extract 1-step forecasts:

refit <- Arima(TS[,'Sales'], model=MODEL, xreg=TS[,colnames(TS) %in% c('xReg1', 'xReg2')])
## Error in `[<-.default`(`*tmp*`, , "drift", value = c(1.00000000000909,  : 
# subscript out of bounds

The confusing part: the exact same code works when using the following data frame (different firm):

#########################################
# Other example: works just fine?
df_noissues <- structure(list(fQtrYear = c(2004.5, 2004.75, 2005, 2005.25, 2005.5, 
  2005.75, 2006, 2006.25, 2006.5, 2006.75, 2007, 2007.25, 2007.5, 
  2007.75, 2008, 2008.25, 2008.5, 2008.75, 2009, 2009.25, 2009.5, 
  2009.75, 2010, 2010.25, 2010.5, 2010.75, 2011, 2011.25, 2011.5, 
  2011.75, 2012, 2012.25, 2012.5, 2012.75, 2013, 2013.25, 2013.5, 
  2013.75, 2014, 2014.25), Sales = c(5818, 5979, 6221, 6410, 6401, 
 6536, 7111, 7797, 7631, 7840, 7908, 8066, 7387, 7387, 6998, 7245, 
 6970, 5688, 4147, 4244, 4615, 5433, 4887, 5187, 5287, 5652, 5958, 
 6585, 6419, 5989, 6006, 5963, 5833, 5898, 5833, 5849, 5765, 5585, 
 5454, 5836), mean_SVI = c(61.1666666666667, 47.9166666666667, 
 48.5833333333333, 51.4166666666667, 56, 51.8461538461538, 50.1666666666667, 
 60.75, 53.1538461538462, 48.9230769230769, 53, 53.6923076923077, 
 55.8461538461538, 46.3333333333333, 51.25, 54.1666666666667, 
 52.4166666666667, 50.4166666666667, 54.4166666666667, 49.3333333333333, 
 49.1666666666667, 39.5833333333333, 41.8333333333333, 43.9166666666667, 
 39.8333333333333, 37.1666666666667, 45.25, 45.9166666666667, 
 45.8333333333333, 39.7692307692308, 52.8461538461538, 60.6153846153846, 
 44.0769230769231, 37.75, 47.5, 45.1666666666667, 42.1666666666667, 
 39.25, 47.25, 47.4166666666667), SD_SVI = c(9.29157324317757, 
 11.0737883255365, 8.37157890324957, 6.08213977498251, 7.80442764775809, 
 9.09987320283598, 6.16195561244131, 11.2583302491977, 10.4390784542678, 
 8.38114489455884, 9.69535971483266, 11.4118696641159, 6.84161474417351, 
 8.96795642408249, 3.22278817739603, 6.23528570947538, 4.73782330790941, 
 9.3269729410149, 16.1777531496094, 10.9903538972992, 9.64679252708412, 
 11.1147595020261, 11.1586357371836, 7.22946412365063, 7.99810583636507, 
 6.89971453076579, 7.97866473221497, 3.89541299790439, 6.24984848301189, 
 7.5294294400245, 17.0775005677361, 12.6855459844296, 6.00640683578153, 
 6.77059148752228, 6.98700091728789, 6.97832140969228, 3.90415474109624, 
 4.39265916563698, 3.64629326103298, 5.08935311719625)), .Names = c("fQtrYear", 
"Sales", "xReg1", "xReg2"), row.names = c(NA, -40L), class = "data.frame")

Example data:

head(df_noissues)
  fQtrYear Sales    xReg1     xReg2
1  2004.50  5818 61.16667  9.291573
2  2004.75  5979 47.91667 11.073788
3  2005.00  6221 48.58333  8.371579
4  2005.25  6410 51.41667  6.082140
5  2005.50  6401 56.00000  7.804428
6  2005.75  6536 51.84615  9.099873

Running the same code to construct a test/training set & ARIMA model:

TS <- ts(df_noissues[,2:4], start = c(2004,3), end = c(2014,2), frequency=4)
TS.TRAIN <- window(TS, end=2011.4)
TS.TEST <- window(TS, start=2011.5)

# Build an arima model
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')])
FCAST <- forecast(MODEL, xreg=TS.TEST[,colnames(TS.TEST) %in% c('xReg1', 'xReg2')])

Extracting the 1-step forecasts: no error.

refit <- Arima(TS[,'Sales'], model=MODEL, xreg=TS[,colnames(TS) %in% c('xReg1', 'xReg2')])
## ARIMA(2,0,0)(0,1,0)[4]

Other than a difference in the model generated (with / without drift), I can't really seem to grasp what might be causing this. Running auto.arima with allowdrift=FALSE indeed seems to resolve the issue.

Answer 1

The problem arises due to the column names for the xreg argument. In the first case, there is a drift term which adds a column to xreg , and that triggers a change in the names of the columns. You can see this when you look at the models.

> MODEL
Series: TS.TRAIN[, "Sales"] 
ARIMA(0,1,0)(1,0,1)[4] with drift         

Coefficients:
        sar1    sma1     drift  xreg.xReg1  xreg.xReg2
      0.8135  0.6554  1475.542     38.1461     84.5589
s.e.  0.1314  0.5816  1071.337     45.7297     74.2390

compared to

> MODEL
Series: TS.TRAIN[, "Sales"] 
ARIMA(2,0,0)(0,1,0)[4]                    

Coefficients:
         ar1      ar2     xReg1    xReg2
      1.3394  -0.5685  -14.4072   4.3869
s.e.  0.1766   0.1783   16.3416  24.3968

I will add it to the bug list and see if I can figure out a solution.

As a work-around, you could refit the model with drift as follows:

TS <- ts(df[,2:4], start = c(2004,3), end = c(2014,2), frequency=4)
TS.TRAIN <- window(TS, end=2011.4)
z <- TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')]
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=z)
MODEL2 <- Arima(TS.TRAIN[,'Sales'], order=MODEL$arma[c(1,6,2)],
                seasonal=MODEL$arma[c(3,7,4)], xreg=cbind(1:nrow(z),z))
z <- TS[,colnames(TS) %in% c('xReg1', 'xReg2')]
refit <- Arima(TS[,'Sales'], model=MODEL2, xreg=cbind(1:nrow(z),z))

Error in obtaining one-step forecasts from auto.arima generated drift model (forecast package)

Question

1 answers

solution1
2 ACCPTED 2015-04-03 02:29:44

Error in obtaining one-step forecasts from auto.arima generated drift model (forecast package)

Question

1 answers

solution1 2 ACCPTED 2015-04-03 02:29:44

solution1
2 ACCPTED 2015-04-03 02:29:44