简体   繁体   English

如何在Python statsmodels ARIMA预测中反转差异?

[英]How to invert differencing in a Python statsmodels ARIMA forecast?

I'm trying to wrap my head around ARIMA forecasting using Python and Statsmodels. 我正试图用Python和Statsmodels来围绕ARIMA预测。 Specifically, for the ARIMA algorithm to work, the data needs to be made stationary via differencing (or similar method). 具体而言,为了使ARIMA算法起作用,需要通过差分(或类似方法)使数据静止。 The question is: How does one invert the differencing after the residual forecast has been made to get back to a forecast including the trend and seasonality that was differenced out? 问题是:在进行剩余预测后,如何在差异化之后反转差异,以回归预测,包括趋势和季节性差异?

(I saw a similar question here but alas, no answers have been posted.) (我在这里看到了一个类似的问题但是唉,没有发布任何答案。)

Here's what I've done so far (based on the example in the last chapter of Mastering Python Data Analysis , Magnus Vilhelm Persson; Luiz Felipe Martins). 这是我到目前为止所做的事情(基于掌握Python数据分析的最后一章中的例子,Magnus Vilhelm Persson; Luiz Felipe Martins)。 The data comes from DataMarket . 数据来自DataMarket

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels import tsa 
from statsmodels.tsa import stattools as stt 
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA 

def is_stationary(df, maxlag=15, autolag=None, regression='ct'): 
    """Test if df is stationary using Augmented 
    Dickey Fuller""" 

    adf_test = stt.adfuller(df,maxlag=maxlag, autolag=autolag, regression=regression) 
    adf = adf_test[0]
    cv_5 = adf_test[4]["5%"]

    result = adf < cv_5    
    return result

def d_param(df, max_lag=12):
    d = 0
    for i in range(1, max_lag):
        if is_stationary(df.diff(i).dropna()):
            d = i
            break;
    return d

def ARMA_params(df):
    p, q = tsa.stattools.arma_order_select_ic(df.dropna(),ic='aic').aic_min_order
    return p, q

# read data
carsales = pd.read_csv('data/monthly-car-sales-in-quebec-1960.csv', 
                   parse_dates=['Month'],  
                   index_col='Month',  
                   date_parser=lambda d:pd.datetime.strptime(d, '%Y-%m'))
carsales = carsales.iloc[:,0] 

# get components
carsales_decomp = seasonal_decompose(carsales, freq=12)
residuals = carsales - carsales_decomp.seasonal - carsales_decomp.trend 
residuals = residuals.dropna()

# fit model
d = d_param(carsales, max_lag=12)
p, q = ARMA_params(residuals)
model = ARIMA(residuals, order=(p, d, q)) 
model_fit = model.fit() 

# plot prediction
model_fit.plot_predict(start='1961-12-01', end='1970-01-01', alpha=0.10) 
plt.legend(loc='upper left') 
plt.xlabel('Year') 
plt.ylabel('Sales')
plt.title('Residuals 1960-1970')
print(arimares.aic, arimares.bic)  

The resulting plot is satisfying, but doesn't include the trend, seasonality info. 由此产生的情节令人满意,但不包括趋势,季节性信息。 How do I invert the differencing to recapture the trend/seasonality? 如何反转差分以重新获得趋势/季节性? Residual plot 剩余情节

Relying on differencing when a time trend (or multiple) may be a better strategy. 当时间趋势(或多个)可能是更好的策略时,依赖于差异。 Period 33 is an outlier and if you ignore it then it has consequences. 期间33是一个异常值,如果你忽略它,它就会产生后果。

The PACF doesn't show a strong seasonal component. PACF没有显示出强烈的季节性成分。 在此输入图像描述

It is a weak seasonal AR with March, April, May and June with strong correlation. 3月,4月,5月和6月的季节性AR较弱,相关性较强。

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM