简体   繁体   中英

Forecast Confidence Interval from bsts package much wider than auto.arima in forecast

I recently read up about the bsts package by Steven Scott at Google for Bayesian Structural Time Series model and wanted to give it a shot against the auto.arima function from forecast package that I have been using for a variety of forecasting tasks.

I tried it on a few examples and was impressed with the efficiency of the package as well as the point forecast. But when I looked at the forecast variance I almost always found that bsts ended up giving a much wider confidence band as compared to that of auto.arima. Here is a sample code on a white noise data

library("forecast")
library("data.table")
library("bsts")
truthData = data.table(target = rnorm(250))
freq = 52
ss = AddGeneralizedLocalLinearTrend(list(), truthData$target)
ss = AddSeasonal(ss, truthData$target, nseasons = freq)
tStart = proc.time()[3]
model = bsts(truthData$target, state.specification = ss, niter = 500)
print(paste("time taken: ", proc.time()[3] - tStart))
burn = SuggestBurn(0.1, model)
pred = predict(model, horizon = 2 * freq, burn = burn, quantiles = c(0.10, 0.90))

## auto arima fit
max.d = 1; max.D = 1; max.p = 3; max.q = 3; max.P = 2; max.Q = 2; stepwise = FALSE
dataXts = ts(truthData$target, frequency = freq)
tStart = proc.time()[3]
autoArFit = auto.arima(dataXts, max.D = max.D, max.d = max.d, max.p = max.p, max.q = max.q, max.P = max.P, max.Q = max.P, stepwise = stepwise)
print(paste("time taken: ", proc.time()[3] - tStart))
par(mfrow = c(2, 1))
plot(pred, ylim = c(-5, 5))
plot(forecast(autoArFit, 2 * freq), ylim = c(-5, 5))

Here is the plot 预测差异在顶部面板auto.arima下面板 I was wondering if someone could shed some light on this behavior and how we could control for the forecast variance. As far as I recall from Dr. Hyndman's papers auto.arima's forecast variance calculation do not account for the parameter estimation variance, ie the variance in estimated ar and ma coefficients. Is that the driving reason for the discrepancy I see here or are there other subtle points that I am missing and can be controlled for by some parameters.

Thanks

Here is a script to test the inclusion probabilities for short to medium range forecasting problem comparing bsts to auto.arima

library("forecast")
library("data.table")
library("bsts")
set.seed(1234)
n = 260
freq = 52
h = 10
rep = 50
max.d = 1; max.D = 1; max.p = 2; max.q = 2; max.P = 1; max.Q = 1; stepwise = TRUE
containsProb = NULL
for (i in 1:rep) {
    print(i)
    truthData = data.table(time = 1:n, target = rnorm(n))
    yTrain = truthData$target[1:(n - h)]
    yTest = truthData$target[(n - h + 1):n]

    ## fit bsts model
    ss = AddLocalLevel(list(), truthData$target)
    ss = AddSeasonal(ss, truthData$target, nseasons = freq)
    tStart = proc.time()[3]
    model = bsts(yTrain, state.specification = ss, niter = 500)
    print(paste("time taken: ", proc.time()[3] - tStart))
    pred = predict(model, horizon = h, burn = SuggestBurn(0.1, model), quantiles = c(0.10, 0.90))
    containsProbBs = sum(yTest > pred$interval[1,] & yTest < pred$interval[2,]) / h

    ## auto.arima model fit
    dataTs = ts(yTrain, frequency = freq)
    tStart = proc.time()[3]
    autoArFit = auto.arima(dataTs, max.D = max.D, max.d = max.d, max.p = max.p, max.q = max.q, max.P = max.P, max.Q = max.P, stepwise = stepwise)
    print(paste("time taken: ", proc.time()[3] - tStart))
    fcst = forecast(autoArFit, h = h)

    ## inclusion probabilities for 80% CI
    containsProbBs = sum(yTest > pred$interval[1,] & yTest < pred$interval[2,]) / h
    containsProbAr = sum(yTest > fcst$lower[,1] & yTest < fcst$upper[,1]) / h
    containsProb = rbindlist(list(containsProb, data.table(bs = containsProbBs, ar = containsProbAr)))
}
colMeans(containsProb)
>  bs   ar 
 0.79 0.80 
c(sd(containsProb$bs), sd(containsProb$ar))
> [1] 0.13337719 0.09176629

The difference is that the BSTS model is nonstationary, while the ARIMA model selected in this case is stationary (actually just white noise). For the BSTS model, the prediction intervals continue to widen over the forecast horizon, while the ARIMA model has constant prediction intervals. For the first forecast horizon, they are relatively close, but they diverge for longer horizons.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM