简体   繁体   English

在从 Python 中的 statsmodels 传递到 SARIMAX() 的 exog 参数之前,我们是否需要对外生变量进行差分?

[英]Do we need to do differencing of exogenous variables before passing to exog argument of SARIMAX() from statsmodels in Python?

I am trying to build a forecasting model using SARIMAX in Python (regression with SARIMA errors) and require some guidance on how exogenous variables are handled in exog argument.我正在尝试在 Python 中使用 SARIMAX 构建预测 model(带有 SARIMA 错误的回归),并且需要一些关于如何在 exog 参数中处理外生变量的指导。

The default parameters is:默认参数为:

SARIMAX(endog, exog=None, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, measurement_error=False, 
time_varying_regression=False, mle_regression=True, simple_differencing=False, enforce_stationarity=True,
enforce_invertibility=True, hamilton_representation=False, concentrate_scale=False, trend_offset=1,
use_exact_diffuse=False, dates=None, freq=None, missing='none', validate_specification=True, **kwargs)

This is how I fitted my model:这就是我安装 model 的方式:

*Before I pass endog and exog to the SARIMAX function I did not transform the variables. *在我将 endog 和 exog 传递给 SARIMAX function 之前,我没有转换变量。

SARIMAX(endog, exog=exog['TMIN_IAC'], order= (0,1,1), seasonal_order= (0,0,0,0), trend='c')

And this is the resultant summary:这是结果摘要:

                                SARIMAX Results                                
==============================================================================
Dep. Variable:                    all   No. Observations:                  151
Model:               SARIMAX(0, 1, 1)   Log Likelihood                -624.229
Date:                Mon, 05 Apr 2021   AIC                           1256.457
Time:                        14:36:48   BIC                           1268.500
Sample:                    01-31-2001   HQIC                          1261.350
                         - 07-31-2013                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept      0.2139      0.071      2.996      0.003       0.074       0.354
TMIN_IAC      -6.1222      0.474    -12.920      0.000      -7.051      -5.193
ma.L1         -0.9504      0.029    -33.060      0.000      -1.007      -0.894
sigma2       237.3801     33.036      7.185      0.000     172.631     302.130
===================================================================================
Ljung-Box (L1) (Q):                   0.25   Jarque-Bera (JB):                 2.21
Prob(Q):                              0.62   Prob(JB):                         0.33
Heteroskedasticity (H):               1.26   Skew:                            -0.08
Prob(H) (two-sided):                  0.42   Kurtosis:                         2.43
===================================================================================

I did a search in the documentation , but the closest thing of my question they cite about is this:我在文档中进行了搜索,但他们引用的最接近我的问题的是:

If simple_differencing = True is used, then the endog and exog data are differenced prior to putting the model in state-space form.如果使用simple_differencing = True,则在将 model 置于状态空间形式之前,会区分 endog 和 exog 数据。 This has the same effect as if the user differenced the data prior to constructing the model, which has implications for using the results这与用户在构建 model 之前区分数据的效果相同,这对使用结果有影响

My concern is because according to Alan Pankratz, in his book Forecasting With Dynamic Regression Models (1991), if differencing is applied to the errors in a multiple regression both of the dependent and the explanatory variables should be differenced, and I am not certain Statsmodels do that automatically.我担心的是,根据 Alan Pankratz 在他的《使用动态回归模型进行预测》 (1991 年)一书中的说法,如果对多元回归中的误差应用差分,则因变量和解释变量都应该是不同的,我不确定 Statsmodels自动执行此操作。

It seems SARIMAX from statsmodels also difference both, the response and the exog variables, automatically.似乎来自SARIMAX的 SARIMAX 也会自动区分响应和 exog 变量。

According to Rob Hyndman, author of Arima function in the forecast package in R:根据 Rob Hyndman 的说法, Arima function 在 R 中的forecast package 的作者:

Arima will difference both the response variable and the xreg variables as specified in the order and seasonal arguments. Arima将区分订单中指定的响应变量和 xreg 变量以及季节性 arguments。 You should never need to do the differencing yourself.您永远不需要自己进行差异化。

So I ran the same model in R and acquired the same results:所以我在 R 中运行了相同的 model 并获得了相同的结果:

Arima(endog, order = c(0,1,1),seasonal = c(0,0,0), xreg = exog, include.drift = TRUE,
lambda = NULL, method = 'ML')

Model summary: Model总结:

Regression with ARIMA(0,1,1) errors 

Coefficients:
          ma1   drift  TMIN_IAC
      -0.9504  0.2139   -6.1219
s.e.   0.0381  0.0724    0.4763

sigma^2 estimated as 242.2:  log likelihood=-624.23
AIC=1256.47   AICc=1256.74   BIC=1268.51

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Statsmodels -python 中使用 SARIMAX 预测具有外生变量的样本外 - Forecasting out-of-sample with exogenous variables using SARIMAX in Statsmodels -python 具有外生变量矩阵的statsmodels SARIMAX大小不同 - statsmodels SARIMAX with exogenous variables matrices are different sizes StatsModels SARIMAX with exogenous variables - 如何提取外生系数 - StatsModels SARIMAX with exogenous variables - how to extract exogenous coefficients statsmodels SARIMAX 摘要中未显示差异术语 - Differencing terms not showing in statsmodels SARIMAX summary Python Statsmodels:将SARIMAX与外生回归变量一起使用以获取预测的均值和置信区间 - Python Statsmodels: Using SARIMAX with exogenous regressors to get predicted mean and confidence intervals 使用python statsmodels修复summary_col中的标签外生变量 - Fix Label Exogenous Variables in summary_col with python statsmodels Sarimax 内生和外生变量 - 前提是外生值的形状不合适 - Sarimax endogenous and exogenous variables - Provided exogenous values are not of the appropriate shape 在 SARIMAX 时间序列预测中添加额外变量作为 exog - Add additional variables as exog in SARIMAX time series forecasting 如何使用 statsmodels 的 ARMA 来预测外生变量? - How to use statsmodels' ARMA to predict with exogenous variables? Python zip-为什么我们需要解压缩参数? - Python zip - Why do we need to unpack the argument?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM