[英]ARIMA forecast gives different results with new python statsmodels
I'm (out-of-sample) forecasting with ARIMA(0,1,0).我正在使用 ARIMA(0,1,0) 进行(样本外)预测。
In python's statsmodels latest stable version 0.12.在 python 的 statsmodels 最新稳定版本 0.12。 I calculate:
我计算:
import statsmodels.tsa.arima_model as stats
time_series = [2, 3.0, 5, 7, 9, 11, 13, 17, 19]
steps = 4
alpha = 0.05
model = stats.ARIMA(time_series, order=(0, 1, 0))
model_fit = model.fit(disp=0)
forecast, _, intervals = model_fit.forecast(steps=steps, exog=None, alpha=alpha)
which results in这导致
forecast = [21.125, 23.25, 25.375, 27.5]
intervals = [[19.5950036, 22.6549964 ], [21.08625835, 25.41374165], [22.72496851, 28.02503149], [24.44000721, 30.55999279]]
and a Future Warning, which suggests:和未来警告,它建议:
FutureWarning:
statsmodels.tsa.arima_model.ARMA and statsmodels.tsa.arima_model.ARIMA have
been deprecated in favor of statsmodels.tsa.arima.model.ARIMA (note the .
between arima and model) and
statsmodels.tsa.SARIMAX. These will be removed after the 0.12 release.
In the new version, as hinted to in the Future Warning, I calculate:在新版本中,正如未来警告中所暗示的那样,我计算:
import statsmodels.tsa.arima.model as stats
time_series = [2, 3.0, 5, 7, 9, 11, 13, 17, 19]
steps = 4
alpha = 0.05
model = stats.ARIMA(time_series, order=(0, 1, 0))
model_fit = model.fit()
forecast = model_fit.get_forecast(steps=steps)
forecasts_and_intervals = forecast.summary_frame(alpha=alpha)
which gives different results:这给出了不同的结果:
forecasts_and_intervals =
y mean mean_se mean_ci_lower mean_ci_upper
0 19.0 2.263842 14.562951 23.437049
1 19.0 3.201556 12.725066 25.274934
2 19.0 3.921089 11.314806 26.685194
3 19.0 4.527684 10.125903 27.874097
I would like to obtain the same results as before.我想获得与以前相同的结果。 Am I using the new interface correctly?
我是否正确使用了新界面?
I need both the forecast and the intervals.我需要预测和间隔。 I tried already to use different functions as just
forecast
the new interface offers.我已经尝试使用不同的功能来
forecast
新界面提供的功能。
In particular I'm wondering why the forecast result is 19 for the entire list.特别是我想知道为什么整个列表的预测结果是 19。
Many thanks for every help.非常感谢您的每一次帮助。
Here is the documentation for statsmodels 0.12.2: https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMA.html?highlight=arima#statsmodels.tsa.arima_model.ARIMA这是 statsmodels 0.12.2 的文档: https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMA.html?highlight=arima#statsARIMAmodels.t。
Here is the documentation for newer version of Arima: https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima.model.ARIMA.html?highlight=arima#statsmodels.tsa.arima.model.ARIMA Here is the documentation for newer version of Arima: https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima.model.ARIMA.html?highlight=arima#statsmodels.tsa.arima.model.ARIMA
The difference is due to whether the models include a "constant" term or not.不同之处在于模型是否包含“常数”项。 For the first case ie older
statsmodels.tsa.arima_model.ARIMA
, it automatically includes a constant term (and no option to turn on/off).对于第一种情况,即较旧的
statsmodels.tsa.arima_model.ARIMA
,它会自动包含一个常数项(并且没有打开/关闭选项)。 If you have a differencing, it also includes it but does so in the differenced domain (otherwise it would be eliminated anyway).如果您有差异,它也包含它,但在差异域中这样做(否则无论如何它都会被消除)。 So here is its ARIMA(0, 1, 0) model:
所以这里是它的 ARIMA(0, 1, 0) model:
y_t - y_{t-1} = c + e_t
which is "random walk with drift".这是“随漂随走”。
For the new statsmodels.tsa.arima.model.ARIMA
, as the documentation you linked says, not any kind of trend term (including constant, ie c
) is included when differencing is involved, which is the case for you.对于新的
statsmodels.tsa.arima.model.ARIMA
,正如您链接的文档所述,在涉及差异时不包括任何类型的趋势项(包括常数,即c
),这就是您的情况。 So here is its ARIMA(0, 1, 0) model:所以这里是它的 ARIMA(0, 1, 0) model:
y_t - y_{t-1} = e_t
which is "random walk" and as we know, forecasts from it corresponds to naive forecasts ie repeating the last value (19 in your case).这是“随机游走”,正如我们所知,它的预测对应于幼稚的预测,即重复最后一个值(在您的情况下为 19)。
Then, what to do to make the new one work?那么,怎样做才能让新的工作正常呢?
It includes a parameter called trend
which you can specify to get the same behaviour.它包括一个名为
trend
的参数,您可以指定它来获得相同的行为。 Since you are using a differencing (d=1), passing trend="t"
should give the same model as the old one.由于您使用的是差分 (d=1),因此通过
trend="t"
应该给出与旧的相同的 model 。 ( "t"
means linear trend but since d = 1
, it will reduce to a constant in the differenced domain): (
"t"
表示线性趋势,但由于d = 1
,它将在差分域中减少为常数):
import statsmodels.tsa.arima.model as stats
time_series = [2, 3.0, 5, 7, 9, 11, 13, 17, 19]
steps = 4
alpha = 0.05
model = stats.ARIMA(time_series, order=(0, 1, 0), trend="t") # only change is here!
model_fit = model.fit()
forecast = model_fit.get_forecast(steps=steps)
forecasts_and_intervals = forecast.summary_frame(alpha=alpha)
and here is what I get for forecasts_and_intervals
:这是我得到的
forecasts_and_intervals
:
y mean mean_se mean_ci_lower mean_ci_upper
0 21.124995 0.780622 19.595004 22.654986
1 23.249990 1.103966 21.086256 25.413724
2 25.374985 1.352077 22.724962 28.025008
3 27.499980 1.561244 24.439997 30.559963
I think this raises another issue.我认为这引发了另一个问题。 I'm not sure exogenous variables are treated the same in the new arima.model version.
我不确定在新的 arima.model 版本中外生变量的处理方式相同。 I believe in the old version, arima_model, they are applied to the order of differences.
我相信在旧版本 arima_model 中,它们适用于差异的顺序。 For (0,0,0) Y=mx+b or if (0,1,0), then dy=mx+b.
对于 (0,0,0) Y=mx+b 或如果 (0,1,0),则 dy=mx+b。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.