I'm trying to use statsmodels' ARIMA to forecast a time series. I'm using sklearn's TimeSeriesSplit
to evaluate my models. Unfortunately, when I forecast the next fold of data (which has true value Y_test
), I get a constant prediction:
if is_arima:
Y_train = Y_train.astype(float)
# build basic ARIMA model
arima_model = ARIMA(Y_train, order=(2,0,1))
# fit it, using exogenous variables
arima_results = arima_model.fit()
# predict next len(test) values, using exogenous variables (X_test)
preds = arima_results.forecast(steps=len(Y_test))[0]
print(preds)
Which gives me:
115.65096239 120.89113477 121.52020239 121.59572014 121.60478583
121.60587414 121.60600479 121.60602047 121.60602235 121.60602258
121.6060226 121.60602261 121.60602261 121.60602261 121.60602261
121.60602261 121.60602261 121.6060226 121.6060226 121.6060226
121.6060226 121.6060226 121.6060226 121.6060226 121.6060226
121.6060226 121.6060226 121.6060226 121.6060226 121.6060226...
This makes me think my ARIMA isn't using the prediction at time t for its prediction at time t+1?
I understand the output isn't perfectly constant but my dataset shows large variation, so this is mildly concerning. Any idea what's going on?
Thanks!
Your using ARIMA(2,0,1), so your prediction is
x(t) = constant + w(t) + a1 * x(t-1) + a2 * x(t-2) + b1 * w(t-1)
So, your prediction depends on 2 factors. You have your autoregressive terms and your moving average term. Your autoregressive terms are just a constant times the prior period's value plus a different constant times the value 2 periods ago. Then you have a moving average term, which is a constant times the error from the prior period's prediction. So your model is probably mostly dominated by the prior 2 periods, and that it probably finds an equilibrium rather quickly.
Try printing out the parameters and then plugging it into excel to see what is happening in the model.
print(arima_model.summary())
print(arima_model.params)
You are making use of recursive strategy to do multi step prediction ie forecasts generated in the prior steps are used for the prediction of next forecasts iteratively. It leads to error accumulation and as a result forecasting converges to a value. Arima does not perform well for very long data series.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.