I know there is a very similar question and answer on stackoverflow ( here ), but this seems to be distinctly different. I am using statsmodels v 0.13.2, and I am using an ARIMA model as opposed to a SARIMAX model.
I am trying to fit a list of time series data sets with an ARIMA model. The offending piece of my code is here:
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
items = np.log(og_items)
items['count'] = items['count'].apply(lambda x: 0 if math.isnan(x) or math.isinf(x) else x)
model = ARIMA(items, order=(14, 0, 7))
trained = model.fit()
items
is a dataframe containing a date index and a single column, count
.
I apply the lambda on the second line because some counts can be 0, resulting in a negative infinity after log is applied. The final product going into the ARIMA does not contain any NaNs or Infinite numbers. However, when I try this without using the log function, I do not get the error. This only occurs on certain series, but there does not seem to be rhyme or reason to which are affected. One series had about half of its values as zero after applying the lambda, while another did not have a single zero. Here is the error:
Traceback (most recent call last):
File "item_pipeline.py", line 267, in <module>
main()
File "item_pipeline.py", line 234, in main
restaurant_predictions = make_predictions(restaurant_data=restaurant_data, models=models,
File "item_pipeline.py", line 138, in make_predictions
predictions = model(*data_tuple[:2], min_date=min_date, max_date=max_date,
File "/Users/rob/Projects/5out-ml/models/item_level/items/predict_arima.py", line 127, in predict_daily_arima
predict_date_arima(prediction_dict, item_dict, prediction_date, x_days_out=x_days_out, log_vals=log_vals,
File "/Users/rob/Projects/5out-ml/models/item_level/items/predict_arima.py", line 51, in predict_date_arima
raise e
File "/Users/rob/Projects/5out-ml/models/item_level/items/predict_arima.py", line 47, in predict_date_arima
fitted = model.fit()
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/tsa/arima/model.py", line 390, in fit
res = super().fit(
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/tsa/statespace/mlemodel.py", line 704, in fit
mlefit = super(MLEModel, self).fit(start_params, method=method,
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/base/model.py", line 563, in fit
xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/base/optimizer.py", line 241, in _fit
xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/base/optimizer.py", line 651, in _fit_lbfgs
retvals = optimize.fmin_l_bfgs_b(func, start_params, maxiter=maxiter,
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_lbfgsb_py.py", line 199, in fmin_l_bfgs_b
res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_lbfgsb_py.py", line 362, in _minimize_lbfgsb
f, g = func_and_grad(x)
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 286, in fun_and_grad
self._update_grad()
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 256, in _update_grad
self._update_grad_impl()
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 173, in update_grad
self.g = approx_derivative(fun_wrapped, self.x, f0=self.f,
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_numdiff.py", line 505, in approx_derivative
return _dense_difference(fun_wrapped, x0, f0, h,
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_numdiff.py", line 576, in _dense_difference
df = fun(x) - f0
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_numdiff.py", line 456, in fun_wrapped
f = np.atleast_1d(fun(x, *args, **kwargs))
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 137, in fun_wrapped
fx = fun(np.copy(x), *args)
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/base/model.py", line 531, in f
return -self.loglike(params, *args) / nobs
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/tsa/statespace/mlemodel.py", line 939, in loglike
loglike = self.ssm.loglike(complex_step=complex_step, **kwargs)
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/tsa/statespace/kalman_filter.py", line 983, in loglike
kfilter = self._filter(**kwargs)
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/tsa/statespace/kalman_filter.py", line 903, in _filter
self._initialize_state(prefix=prefix, complex_step=complex_step)
File "/Users/rob/Projects/5out-ml/venv/lib/python3.8/site-packages/statsmodels/tsa/statespace/representation.py", line 983, in _initialize_state
self._statespaces[prefix].initialize(self.initialization,
File "statsmodels/tsa/statespace/_representation.pyx", line 1362, in statsmodels.tsa.statespace._representation.dStatespace.initialize
File "statsmodels/tsa/statespace/_initialization.pyx", line 288, in statsmodels.tsa.statespace._initialization.dInitialization.initialize
File "statsmodels/tsa/statespace/_initialization.pyx", line 406, in statsmodels.tsa.statespace._initialization.dInitialization.initialize_stationary_stationary_cov
File "statsmodels/tsa/statespace/_tools.pyx", line 1206, in statsmodels.tsa.statespace._tools._dsolve_discrete_lyapunov
numpy.linalg.LinAlgError: LU decomposition error.
The solution in the other stackoverflow post was to initialize the statespace differently. It looks like the statespace is involved, if you look at the last few lines of the error. However, it does not seem that that workflow is exposed in the newer version of statsmodels. Is it? If not, what else can I try to circumvent this error?
So far, I have tried manually initializing the model to approximate diffuse
, and manually setting the initialize
property to approximate diffuse
. Neither seem to be valid in the new statsmodels code.
Turns out there's a new way to initialize. The second line below is the operative line.
model = ARIMA(items, order=(14, 0, 7))
model.initialize_approximate_diffuse() # this line
trained = model.fit()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.