简体   繁体   中英

ARIMA modeling on time-series dataframe python

I'm trying to use ARIMA model for forecasting. I'm new to it. I have tried to plot seasonal_decompose() of my data-set (hourly data), below is the plot?

在此处输入图片说明

I want to understand these plots, brief description will be helpful. I see that there is no trend initially and after some time there is an upward trend. I'm not sure if I'm saying this right? I want to understand how to read these graphs properly. Please give some good description.

When I'm trying to apply Dickey-Fuller test to check if my data is stationary or not and I need further differencing or not, I got the below results:

Test Statistic                   -4.117543
p-value                           0.000906
Lags Used                       30.000000
Number of Observations Used    4289.000000
Critical Value (1%)              -3.431876
Critical Value (5%)              -2.862214
Critical Value (10%)             -2.567129

I'm referring 2 links to understand this : http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/

this link says when test-statistic is greater than critical value, it means that data is stationary; on the other hand the other link says vice versa. I'm confused on this also I referred otexts.org it says we should check on the basis of p-value. Please suggest how do I interpret results given by ADF test?

Also, when I tried to apply ARIMA model on dataset:

from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(df.y, order=(0,1,0))
model_fit = model.fit()

My dataframe has datetime column as index and y column has float values. When I'm applying model on this dataframe. I'm getting error of this sort:

IndexError: list index out of range.

This error is coming when I'm trying to print the summary of model using :

print(model_fit.summary())

Please help me with this. So that I can get better understanding of ARIMA.

Cross validation for ARIMA (AutoRegressive Integrated Moving Average) time series: K-fold cross validation does not work for time-series. Instead, use backtesting techniques like walk-forward and rolling windows .

K-fold cross-validation for autoregression: Although cross-validation is (usually) not valid for time series (ARIMA) models, K-fold works for autoregressions as long as the models considered have uncorrelated errors, and you have tested it with the Ljung Box Test , for XAI (Explainable Artificial Intelligence) in time series use cases.

There are a few Python statistics libs that have these methods avail, here are two: Python Stats Tests and Python StatsModels .

To get the diff of values, you can simply enforce int8's using Python 3.6+ PEP 487 Descriptors , where you can enforce a type list that always returns int8's, for faster computation as well (list : list -> list of ints) :

list_a = [1,2,3]
list_b = [2,3]
print(set(list_a).difference(set(list_b)))
`answer is` set([1])

As to explaining an ARIMA model, I can only refer you to

https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average

With regards to the issue you are receiving when you call model_fit.summary() , I think this happening because order=(0,1,0) . There are no p or q parameters for the model to estimate only a constant difference.

You can see the constant difference is just the average of the differenced values if you run the code below:

#differences in forecasted values
pd.Series(model_fit.forecast(steps=10)[0]).diff(1)

#results
#0              NaN
#1    107904.396563
#2    107904.396563
#3    107904.396563
#4    107904.396563

#mean of the original time series differenced once
model_fit.model.endog.mean()
#107904.3965625

When you changed it to order=(0,1,1) or order=(1,1,0) the summary will print just fine, but that is of course a different model and makes different assumption about how the random process evolves through time.

When using the ADf stat to generate your ARIMA model summary for your model, you should be looking out for the ADF-test, Critical value and your p-value to help you gain insight .

When your Critical value is less than your ADF stat, then you're most likely to have a non stationary series Ie your series is showing seasonality or some trend. Then the next to look out for is your p-value; if it is less than the value 0.05 then your series is no doubt stationary otherwise it is seasonal.

As regard to your IndexError , I feel it's because you didn't feed your model the lag value, your model works on the basis of observation gotten from your residual plot and acf plot. Or you can try with a simple model of (1,0,0) or (1,1,0). Hope that helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM