简体   繁体   中英

R's auto.arima() equivalent in Python

I would like to implement equivalent of auto.arima() function of R in python .

In R auto.arima function takes time series values as input computes ARIMA order parameters (p,d,q values) and fits a model, there is no need to provide p,d,q values as inputs by the user.

I want to use the equivalent of auto.arima function in python (without calling auto.arima R) to predict future values in a time series. In the following time series executing auto.arima-python for 40 points and predicting next 6 values, then moving the window by 1 point and again performing the same procedure.

Following is exemplary data:

value
0
2.584751
2.884758
2.646735
2.882105
3.267503
3.94552
4.70788
5.384803
54.77972
62.87139
78.68957
112.7166
155.0074
170.8084
196.1941
237.4928
254.9718
175.0717
217.3807
244.7357
274.4517
304.6838
373.3202
345.6252
461.2653
443.5982
472.3653
469.3326
506.8819
532.1639
542.2837
514.9269
528.0194
540.539
542.7031
556.8262
569.7132
576.2339
577.7212
577.0873
569.6199
573.2445
573.7825
589.3506

I have tried to write functions to compute order of differencing using AD Fuller Test, passing differentiated time series (which becomes stationary after differencing original time series as per the adfuller test result) to arma order select function to compute P,Q order values.

Further use these values to pass on to the arima function in Statsmodels. But the functions do not seem to work.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import acf, pacf

def diff_terms(timeseries):
    i=1
    j=0
    while i != 0:
        dftest = adfuller(timeseries, autolag='AIC')
        if dftest[0] <= dftest[4]["5%"]:
            i = 0
        else:
            timeseries = np.diff(timeseries)
            i = 1
            j = j + 1
    return j

def p_q_values_estimator(timeseries):
    p=0
    q=0
    lag_acf = acf(timeseries, nlags=20)
    lag_pacf = pacf(timeseries, nlags=20, method='ols')
    y=1.96/np.sqrt(len(timeseries))

    if lag_acf[0] < y:
        for a in lag_acf:
            if a < y:
                q = q + 1
                break 
    elif lag_acf[0] > y:
        for c in lag_acf:
            if c > y:
                q = q + 1
                break

    if lag_pacf[0] < y:
        for b in lag_pacf:
            if b < y:
                p = p + 1
                break
    elif lag_pacf[0] > y:
        for d in lag_pacf:
            if d > y:
                p = p + 1
                break

    p_q=[p,q]
    return(p_q)

def p_q_values_estimator2(timeseries):
    res = sm.tsa.arma_order_select_ic(timeseries, ic=['aic'], max_ar=5, max_ma=4,trend='nc')
    return res.aic_min_order

data1=[]
data=pd.read_csv('ABC.csv')
d_value=diff_terms(data.value)
data1[:]=data[:]
data = data[0:40]

i=0
while i < d_value:
    data_diff = np.diff(data)
    i = i+1

p_q_values=p_q_values_estimator(data)
p_value=p_q_values[0]
q_value=p_q_values[1]

p_q_values2=p_q_values_estimator2(data_diff)
p_value2=p_q_values2[0]
q_value2=p_q_values2[1]


exogx = np.array(range(0,40))
fit2 = sm.tsa.ARIMA(np.array(data), (p_value, d_value, q_value), exog = exogx).fit()
print(fit2.fittedvalues)
pred2 = fit2.predict(start = 40, end = 45, exog = np.array(range(40,46)))
print(pred2)
plt.plot(fit2.fittedvalues)
plt.plot(np.array(data))
plt.plot(range(40,45), np.array(pred2))
plt.show()

Errors – on using arma order select

p_q_values2=p_q_values_estimator2(data_diff)
line 56, in p_q_values_estimator2
res = sm.tsa.arma_order_select_ic(timeseries, ic=['aic'], max_ar=5, max_ma=4,trend='nc')
File "C:\Python27\lib\site-packages\statsmodels\tsa\stattools.py", line 1052, in arma_order_select_ic min_res.update({i + '_min_order' : (mins[0][0], mins[1][0])})
IndexError: index 0 is out of bounds for axis 0 with size 0

Errors – on using acf pacf based function for computation of P,Q order

fit2 = sm.tsa.ARIMA(np.array(data), (p_value, d_value, q_value), exog = exogx).fit()
File "C:\Python27\lib\site-packages\statsmodels\tsa\arima_model.py", line 1104, in fit
callback, **kwargs)
File "C:\Python27\lib\site-packages\statsmodels\tsa\arima_model.py", line 942, in fit
armafit.mle_retvals = mlefit.mle_retvals
AttributeError: 'LikelihoodModelResults' object has no attribute 'mle_retvals'

Vals is my own thing, but you can create your own index with pd.date_range

rdata=ts(traindf.requests_per_active.values,frequency=12)
#forecasts
fit=forecast.auto_arima(rdata)
forecast_output=forecast.forecast(fit,h=6,level=(95.0))
#convert forecasts to dataframe     
forecast_results=pd.Series(forecast_output[3], index=vals.index)
lowerpi=pd.Series(forecast_output[4], index=vals.index)
upperpi=pd.Series(forecast_output[5], index=vals.index)
results = pd.DataFrame({'forecast' : forecast_results, 'lowerpi' : lowerpi, 'upperpi' : upperpi})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM