Statsmodels ARIMA - 使用预测（）和预测（）的不同结果

Question

我使用（ Statsmodels ）ARIMA 来预测一系列值：

plt.plot(ind, final_results.predict(start=0 ,end=26))
plt.plot(ind, forecast.values)
plt.show()

我以为我会从这两种方法中得到相同的结果，但我得到了这个：

我想知道是使用predict()还是 predict( forecast() 。

Answer 1

从图表上看，您好像在使用 predict forecast()进行样本外预测，使用 predict 进行样本内预测。 基于 ARIMA 方程的性质，对于较长的预测周期，样本外预测往往会收敛到样本均值。

为了了解 predict( forecast()和predict()如何在不同场景下工作，我系统地比较了ARIMA_results类中的各种模型。 随意在此存储库中重现与statsmodels_arima_comparison.py的比较。 我研究了order=(p,d,q)的每种组合，仅将p, d, q限制为 0 或 1。例如，可以使用order=(1,0,0)获得简单的自回归模型。 简而言之，我使用以下（固定）时间序列研究了三个选项：

A. 迭代样本内预测形成历史。 历史由时间序列的前 80% 组成，而测试集由最后 20% 组成。 然后我预测了测试集的第一个点，将真实值添加到历史中，预测了第二个点等。这将对模型的预测质量进行评估。

for t in range(len(test)):
    model = ARIMA(history, order=order)
    model_fit = model.fit(disp=-1)
    yhat_f = model_fit.forecast()[0][0]
    yhat_p = model_fit.predict(start=len(history), end=len(history))[0]
    predictions_f.append(yhat_f)
    predictions_p.append(yhat_p)
    history.append(test[t])

B. 接下来，我通过迭代预测测试系列的下一个点，并将此预测附加到历史记录中来研究样本外预测。

for t in range(len(test)):
    model_f = ARIMA(history_f, order=order)
    model_p = ARIMA(history_p, order=order)
    model_fit_f = model_f.fit(disp=-1)
    model_fit_p = model_p.fit(disp=-1)
    yhat_f = model_fit_f.forecast()[0][0]
    yhat_p = model_fit_p.predict(start=len(history_p), end=len(history_p))[0]
    predictions_f.append(yhat_f)
    predictions_p.append(yhat_p)
    history_f.append(yhat_f)
    history_f.append(yhat_p)

C. 我使用了forecast(step=n)参数和predict(start, end)参数，以便使用这些方法进行内部多步预测。

model = ARIMA(history, order=order)
    model_fit = model.fit(disp=-1)
    predictions_f_ms = model_fit.forecast(steps=len(test))[0]
    predictions_p_ms = model_fit.predict(start=len(history), end=len(history)+len(test)-1)

事实证明：

A. 预测和预测 AR 的结果相同，但 ARMA 的结果不同：测试时间序列图

B. 预测和预测对 AR 和 ARMA 产生不同的结果：测试时间序列图

C. 预测和预测 AR 的结果相同，但 ARMA 的结果不同：测试时间序列图

此外，比较 B. 和 C. 中看似相同的方法，我发现结果存在细微但明显的差异。

我认为差异主要是由于 predict() 中的“预测是在原始内生变量的水平上完成的”，而predict() forecast()产生水平差异的预测（比较 API 参考）。

此外，鉴于我更信任 statsmodels 函数的内部功能而不是简单的迭代预测循环（这是主观的），我建议使用forecast(step)或predict(start, end) 。

Answer 2

继续 noteven2degrees 的回复，我提交了一个拉取请求以更正方法 B 从history_f.append(yhat_p)到history_p.append(yhat_p) 。

此外，正如 noteven2degrees 建议的那样，与 predict( forecast()不同， predict()需要一个参数typ='levels'来输出预测，而不是差异预测。

经过上述两次改动后，方法B产生的结果与方法C相同，而方法C所用的时间要少得多，这是合理的。 两者都收敛到一个趋势，因为我认为这与模型本身的平稳性有关。

无论采用哪种方法，对于 p,d,q 的任何配置， predict( forecast()和predict()都会产生相同的结果。

Statsmodels ARIMA - 使用预测（）和预测（）的不同结果

问题描述

2 个解决方案

解决方案1
21 2018-02-16 20:58:17

解决方案2
4 2019-02-15 05:10:49

Statsmodels ARIMA - 使用预测（）和预测（）的不同结果

问题描述

2 个解决方案

解决方案1 21 2018-02-16 20:58:17

解决方案2 4 2019-02-15 05:10:49

解决方案1
21 2018-02-16 20:58:17

解决方案2
4 2019-02-15 05:10:49