简体   繁体   English

在预测中预测来自bsts包的置信区间比auto.arima宽得多

[英]Forecast Confidence Interval from bsts package much wider than auto.arima in forecast

I recently read up about the bsts package by Steven Scott at Google for Bayesian Structural Time Series model and wanted to give it a shot against the auto.arima function from forecast package that I have been using for a variety of forecasting tasks. 我最近阅读了Steven Scott在Google上针对贝叶斯结构时间序列模型的bsts包,并希望能够针对我用于各种预测任务的预测包中的auto.arima函数进行测试。

I tried it on a few examples and was impressed with the efficiency of the package as well as the point forecast. 我尝试了几个例子,并对包的效率和点预测印象深刻。 But when I looked at the forecast variance I almost always found that bsts ended up giving a much wider confidence band as compared to that of auto.arima. 但是,当我查看预测方差时,我几乎总是发现,与auto.arima相比,bsts最终给出了更宽的置信区间。 Here is a sample code on a white noise data 以下是白噪声数据的示例代码

library("forecast")
library("data.table")
library("bsts")
truthData = data.table(target = rnorm(250))
freq = 52
ss = AddGeneralizedLocalLinearTrend(list(), truthData$target)
ss = AddSeasonal(ss, truthData$target, nseasons = freq)
tStart = proc.time()[3]
model = bsts(truthData$target, state.specification = ss, niter = 500)
print(paste("time taken: ", proc.time()[3] - tStart))
burn = SuggestBurn(0.1, model)
pred = predict(model, horizon = 2 * freq, burn = burn, quantiles = c(0.10, 0.90))

## auto arima fit
max.d = 1; max.D = 1; max.p = 3; max.q = 3; max.P = 2; max.Q = 2; stepwise = FALSE
dataXts = ts(truthData$target, frequency = freq)
tStart = proc.time()[3]
autoArFit = auto.arima(dataXts, max.D = max.D, max.d = max.d, max.p = max.p, max.q = max.q, max.P = max.P, max.Q = max.P, stepwise = stepwise)
print(paste("time taken: ", proc.time()[3] - tStart))
par(mfrow = c(2, 1))
plot(pred, ylim = c(-5, 5))
plot(forecast(autoArFit, 2 * freq), ylim = c(-5, 5))

Here is the plot 这是情节 预测差异在顶部面板auto.arima下面板 I was wondering if someone could shed some light on this behavior and how we could control for the forecast variance. 我想知道是否有人可以对这种行为有所了解以及我们如何控制预测方差。 As far as I recall from Dr. Hyndman's papers auto.arima's forecast variance calculation do not account for the parameter estimation variance, ie the variance in estimated ar and ma coefficients. 据我所知,Hyndman博士的论文中auto.arima的预测方差计算不考虑参数估计方差,即估计的ar和ma系数的方差。 Is that the driving reason for the discrepancy I see here or are there other subtle points that I am missing and can be controlled for by some parameters. 这是我在这里看到的差异的驱动原因还是我缺少其他微妙的点,可以通过一些参数来控制。

Thanks 谢谢

Here is a script to test the inclusion probabilities for short to medium range forecasting problem comparing bsts to auto.arima 这是一个脚本,用于测试将bsts与auto.arima进行比较的中短程预测问题的包含概率

library("forecast")
library("data.table")
library("bsts")
set.seed(1234)
n = 260
freq = 52
h = 10
rep = 50
max.d = 1; max.D = 1; max.p = 2; max.q = 2; max.P = 1; max.Q = 1; stepwise = TRUE
containsProb = NULL
for (i in 1:rep) {
    print(i)
    truthData = data.table(time = 1:n, target = rnorm(n))
    yTrain = truthData$target[1:(n - h)]
    yTest = truthData$target[(n - h + 1):n]

    ## fit bsts model
    ss = AddLocalLevel(list(), truthData$target)
    ss = AddSeasonal(ss, truthData$target, nseasons = freq)
    tStart = proc.time()[3]
    model = bsts(yTrain, state.specification = ss, niter = 500)
    print(paste("time taken: ", proc.time()[3] - tStart))
    pred = predict(model, horizon = h, burn = SuggestBurn(0.1, model), quantiles = c(0.10, 0.90))
    containsProbBs = sum(yTest > pred$interval[1,] & yTest < pred$interval[2,]) / h

    ## auto.arima model fit
    dataTs = ts(yTrain, frequency = freq)
    tStart = proc.time()[3]
    autoArFit = auto.arima(dataTs, max.D = max.D, max.d = max.d, max.p = max.p, max.q = max.q, max.P = max.P, max.Q = max.P, stepwise = stepwise)
    print(paste("time taken: ", proc.time()[3] - tStart))
    fcst = forecast(autoArFit, h = h)

    ## inclusion probabilities for 80% CI
    containsProbBs = sum(yTest > pred$interval[1,] & yTest < pred$interval[2,]) / h
    containsProbAr = sum(yTest > fcst$lower[,1] & yTest < fcst$upper[,1]) / h
    containsProb = rbindlist(list(containsProb, data.table(bs = containsProbBs, ar = containsProbAr)))
}
colMeans(containsProb)
>  bs   ar 
 0.79 0.80 
c(sd(containsProb$bs), sd(containsProb$ar))
> [1] 0.13337719 0.09176629

The difference is that the BSTS model is nonstationary, while the ARIMA model selected in this case is stationary (actually just white noise). 不同之处在于BSTS模型是非平稳的,而在这种情况下选择的ARIMA模型是静止的(实际上只是白噪声)。 For the BSTS model, the prediction intervals continue to widen over the forecast horizon, while the ARIMA model has constant prediction intervals. 对于BSTS模型,预测间隔在预测范围内继续扩大,而ARIMA模型具有恒定的预测间隔。 For the first forecast horizon, they are relatively close, but they diverge for longer horizons. 对于第一个预测范围,它们相对接近,但它们在更长的视野中发散。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM