使用Python训练Arima模型时如何解决LinAlgError和ValueError

Question

I am trying to implement a time series model and getting some strange exceptions that tells nothing to me. 我正在尝试实现时间序列模型，并得到一些奇怪的异常，这些异常对我没有任何帮助。 I wonder if I am making a mistake or if it is totally expected. 我想知道我是在犯错误还是完全可以预期。 Here comes details... 这是细节...

When training my model, I try to make a grid search to find the best (p, d, q) settings. 在训练模型时，我尝试进行网格搜索以找到最佳的（p，d，q）设置。 Here is the complete code (and I will explain down below what is happening here): 这是完整的代码（下面我将解释发生的事情）：

The reproducible code below is essentially a copy from https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/ , with some slight changes...: 下面的可复制代码本质上是https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/的副本，但有一些细微变化...：

import warnings
from pandas import Series
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error

# evaluate an ARIMA model for a given order (p,d,q)
def evaluate_arima_model(X, arima_order):
    # prepare training dataset
    train_size = int(len(X) * 0.66)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions = list()
    for t in range(len(test)):
        model = ARIMA(history, order=arima_order)
        model_fit = model.fit(disp=0)
        yhat = model_fit.forecast()[0]
        predictions.append(yhat)
        history.append(test[t])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float64')
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    print("Evaluating the settings: ", p, d, q)
                    mse = evaluate_arima_model(dataset, order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.3f' % (order,mse))
                except Exception as exception:
                    print("Exception occured...", type(exception).__name__, "\n", exception)

    print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))

# dataset
values = np.array([-1.45, -9.04, -3.64, -10.37, -1.36, -6.83, -6.01, -3.84, -9.92, -5.21,
                   -8.97, -6.19, -4.12, -11.03, -2.27, -4.07, -5.08, -4.57, -7.87, -2.80,
                   -4.29, -4.19, -3.76, -22.54, -5.87, -6.39, -4.19, -2.63, -8.70, -3.52, 
                   -5.76, -1.41, -6.94, -12.95, -8.64, -7.21, -4.05, -3.01])

# evaluate parameters
p_values = [7, 8, 9, 10]
d_values = range(0, 3)
q_values = range(0, 3)
warnings.filterwarnings("ignore")
evaluate_models(values, p_values, d_values, q_values)

And here is the output (not everything but it gives enough information): 这是输出（不是所有内容，但它提供了足够的信息）：

Evaluating the settings:  7 0 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 1
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 1 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 2 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.

The code is simply trying all different given settings, training the model, calculating MSE (mean squared error) for each given setting, and then selecting the best one (based on minimum MSE). 代码只是简单地尝试所有不同的给定设置，训练模型，为每个给定设置计算MSE（均方误差），然后选择最佳设置（基于最小MSE）。

But during the training procedure, the code keeps throwing LinAlgError and ValueError exceptions, which tells nothing to me. 但是在培训过程中，代码不断抛出LinAlgError和ValueError异常，这对我没有任何帮助。

And as far as I can follow it, the code is not really truly training certain settings when these exceptions are thrown, and then just jumping to the next setting that will be tried out. 据我所知，当抛出这些异常时，代码并不是真正地在真正训练某些设置，然后只是跳到将要尝试的下一个设置。

Why do I see these exceptions? 为什么我看到这些例外？ Can they be ignored? 他们可以被忽略吗？ What do I need to do to solve it out? 我需要怎么做才能解决？

Answer 1

First, to answer your specific question: I think the "SVD did not converge" is a bug in the ARIMA model of Statsmodels. 首先，回答您的特定问题：我认为“ SVD未收敛”是Statsmodels的ARIMA模型中的错误。 The SARIMAX model better supported these days (and does everything the ARIMA model does + more), so I would recommend using that instead. 如今，SARIMAX模型得到了更好的支持（并且ARIMA模型所做的一切以及更多功能都得到了支持），所以我建议改用它。 To do so, replace model creation with: 为此，将模型创建替换为：

model = sm.tsa.SARIMAX(history, trend='c', order=arima_order, enforce_stationarity=False, enforce_invertibility=False)

With that being said, I think that you are still unlikely to get good results given your time series and the specifications you are trying. 话虽如此，我认为鉴于时间序列和所尝试的规范，您仍然不太可能获得良好的结果。

In particular, your time series is very short, and you are only considering extremely long autoregressive lag lengths (p > 6). 特别是，您的时间序列非常短，并且您仅考虑极长的自回归滞后长度（p> 6）。 It will be difficult to estimate that many parameters with so few data points, particularly when you also have integration (d = 1 or d = 2) and when you also add in moving average components. 很难估计许多参数具有很少的数据点，尤其是当您还具有积分（d = 1或d = 2）并且还添加移动平均成分时。 I suggest that you re-evaluate which models you are considering. 我建议您重新评估您正在考虑的模型。

使用Python训练Arima模型时如何解决LinAlgError和ValueError

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-03-14 02:09:21

使用Python训练Arima模型时如何解决LinAlgError和ValueError

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-03-14 02:09:21

解决方案1
2 已采纳 2019-03-14 02:09:21