简体繁体 English

ARIMA在statsmodels中没有样本预测？

[英]ARIMA out of sample prediction in statsmodels?

原文 2014-06-26 05:23:26 2 1 python/ statsmodels

I have a timeseries forecasting problem that I am using the statsmodels python package to address. 我有一个时间序列预测问题，我正在使用statsmodels python包来解决。 Evaluating using the AIC criteria, the optimal model turns out to be quite complex, something like ARIMA(27,1,8) [ I haven't done an exhaustive search of the parameter space, but it seems to be at a minima around there]. 使用AIC标准进行评估，最佳模型变得相当复杂，类似于ARIMA（27,1,8）[我没有对参数空间进行详尽的搜索，但它似乎在那里的最小值]。 I am having real trouble validating and forecasting with this model though, because it takes a very long time (hours) to train a single model instance, so doing repeated tests is very difficult. 我在使用此模型进行验证和预测时遇到了麻烦，因为训练单个模型实例需要很长时间（小时），因此进行重复测试非常困难。

In any case, what I really need as a minimum in order to be able to use statsmodels in operations (assuming I can get the model validated somehow first) is an mechanism for incorporating new data as it arrives in order to make the next set of forecasts. 在任何情况下，我真正需要的是为了能够在操作中使用statsmodels（假设我可以首先以某种方式验证模型）是一种机制，用于在新数据到达时合并以生成下一组预测。 I would like to be able to fit a model on the available data, pickle it, and then unpickle later when the next datapoint is available and incorporate that into an updated set of forecasts. 我希望能够在可用数据上拟合模型，对其进行选择，然后在下一个数据点可用时进行解开，并将其合并到更新的预测集中。 At the moment I have to re-fit the model each time new data becomes available, which as I said takes a very long time. 目前，每当新数据可用时我都必须重新调整模型，正如我所说，这需要很长时间。

I had a look at this question which address essentially the problem I have but for ARMA models. 我看了一下这个问题，主要解决了我对ARMA模型的问题。 For the ARIMA case however there is the added complexity of the data being differenced. 然而，对于ARIMA情况，存在增加的数据复杂性的差异。 I need to be able to produce new forecasts of the original timeseries (cf typ='levels' keyword in the ARIMAResultsWrapper.predict method). 我需要能够生成原始时间序列的新预测（参见ARIMAResultsWrapper.predict方法中的typ ='levels'关键字）。 It's my understanding that statsmodels cannot do this at present, but what components of the existing functionality would I need to use in order to write something to do this myself? 我的理解是，statsmodels目前无法做到这一点，但是我需要使用现有功能的哪些组件才能自己写一些东西来做这件事？

Edit: I am also using transparams=True, so the prediction process needs to be able to transform the predictions back into the original timeseries, which is an additional difficulty in a homebrew approach. 编辑：我也使用transparams = True，因此预测过程需要能够将预测转换回原始时间序列，这是自制方法中的另一个难点。

1 个解决方案

An ARIMA(27,1,8) model is extremely complex, in the scheme of things. 在事物的方案中，ARIMA（27,1,8）模型非常复杂。 For most time series, you can do reasonable prediction with five or so parameters. 对于大多数时间序列，您可以使用五个左右的参数进行合理的预测。 Of course it depends on the data and domain, but I'm very skeptical that 27 + 8 = 35 parameters are necessary. 当然这取决于数据和领域，但我非常怀疑27 + 8 = 35个参数是必要的。

The AIC is occasionally known to be too permissive with number of parameters. 偶尔会知道AIC对参数的数量过于宽松。 I'd try comparing results with BIC. 我会尝试将结果与BIC进行比较。

I'd also look into whether your data has seasonality of some kind. 我还会研究一下您的数据是否具有某种季节性。 Eg, maybe all 27 of those AR terms don't matter, and you really just need lag=1, and lag=24 (for instance). 例如，这些AR术语中的所有27个都无关紧要，您实际上只需要滞后= 1和滞后= 24（例如）。 That might be the case for hourly data that has daily seasonality. 这可能是每日季节性的每小时数据。