简体   繁体   中英

How can I reproduce the statsmodels ARIMA filter?

I am attempting to reproduce the filters used in an ARIMA model using stastmodels recursive_filter and convolution_filter . (My end objective is to use these filters for prewhitening an exogenous series.)

I begin by working with just an AR model and recursive filter. Here is the simplified experimental setup:

import numpy as np
import statsmodels as sm

np.random.seed(42)

# sample data
series = sm.tsa.arima_process.arma_generate_sample(ar=(1,-0.2,-0.5), ma=(1,), nsample=100)

model = sm.tsa.arima.model.ARIMA(series, order=(2,0,0)).fit()
print(model.summary())

Which gracefully produces the following, which seems fair enough:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:                 ARIMA(2, 0, 0)   Log Likelihood                -131.991
Date:                Wed, 07 Apr 2021   AIC                            271.982
Time:                        12:58:39   BIC                            282.403
Sample:                             0   HQIC                           276.200
                                - 100                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.3136      0.266     -1.179      0.238      -0.835       0.208
ar.L1          0.2135      0.084      2.550      0.011       0.049       0.378
ar.L2          0.4467      0.101      4.427      0.000       0.249       0.645
sigma2         0.8154      0.126      6.482      0.000       0.569       1.062
===================================================================================
Ljung-Box (L1) (Q):                   0.10   Jarque-Bera (JB):                 0.53
Prob(Q):                              0.75   Prob(JB):                         0.77
Heteroskedasticity (H):               0.98   Skew:                            -0.16
Prob(H) (two-sided):                  0.96   Kurtosis:                         2.85
===================================================================================

I fit an AR(2) and obtain coefficients for lag 1 and 2 based on the SARIMAX results. My intuition for reproducing this model using statsmodels.tsa.filters.filtertools.recursive_filter is like so:

filtered = sm.tsa.filters.filtertools.recursive_filter(series, ar_coeff=(-0.2135, -0.4467))

(And maybe adding in the constant from the regression results as well). However, a direct comparison of the results shows the recursive filter doesn't replicate the AR model:

import matploylib.pyplot as plt

# ARIMA residuals
plt.plot(model.resid)

# Calculated residuals using recursive filter outcome
plt.plot(filtered)

Am I approaching this incorrectly? Should I be using a different filter function? The next step for me is to perform the same task but on an MA model so that I can add (?) the results together to obtain a full ARMA filter for prewhitening.

Note: this question may be valuable to somebody searching for "how can I prewritten timeseries data?" particularly in Python using statsmodels.

I guess you should use convolution_filter for the AR part and recursive_filter for the MA part. Combining these sequentially will work for ARMA models. Alternatively, you can use arma_innovations for an exact approach that works with both AR and MA parts simultaneously. Here are some examples:

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.innovations import arma_innovations

AR(2)

np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1, -0.2, -0.5), ma=(1,), nsample=100)

res = sm.tsa.arima.ARIMA(series, order=(2, 0, 0), trend='n').fit()
print(pd.DataFrame({
    'ARIMA resid': res.resid,
    'arma_innovations': arma_innovations.arma_innovations(
        series, ar_params=res.params[:-1])[0],
    'convolution filter': sm.tsa.filters.convolution_filter(
        series, np.r_[1, -res.params[:-1]], nsides=1)}))

gives:

    ARIMA resid  arma_innovations  convolution filter
0      0.496714          0.496714                 NaN
1     -0.254235         -0.254235                 NaN
2      0.666326          0.666326            0.666326
3      1.493315          1.493315            1.493315
4     -0.256708         -0.256708           -0.256708
..          ...               ...                 ...
95    -1.438670         -1.438670           -1.438670
96     0.323470          0.323470            0.323470
97     0.218243          0.218243            0.218243
98     0.012264          0.012264            0.012264
99    -0.245584         -0.245584           -0.245584

MA(1)

np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1,), ma=(1, 0.2), nsample=100)

res = sm.tsa.arima.ARIMA(series, order=(0, 0, 1), trend='n').fit()
print(pd.DataFrame({
    'ARIMA resid': res.resid,
    'arma_innovations': arma_innovations.arma_innovations(
        series, ma_params=res.params[:-1])[0],
    'convolution filter': sm.tsa.filters.recursive_filter(series, -res.params[:-1])}))

gives:

    ARIMA resid  arma_innovations    recursive filter
0      0.496714          0.496714            0.496714
1     -0.132893         -0.132893           -0.136521
2      0.646110          0.646110            0.646861
3      1.525620          1.525620            1.525466
4     -0.229316         -0.229316           -0.229286
..          ...               ...                 ...
95    -1.464786         -1.464786           -1.464786
96     0.291233          0.291233            0.291233
97     0.263055          0.263055            0.263055
98     0.005637          0.005637            0.005637
99    -0.234672         -0.234672           -0.234672

ARMA(1, 1)

np.random.seed(42)
series = sm.tsa.arma_generate_sample(ar=(1, 0.5), ma=(1, 0.2), nsample=100)

res = sm.tsa.arima.ARIMA(series, order=(1, 0, 1), trend='n').fit()
a = res.resid

# Apply the recursive then convolution filter
tmp = sm.tsa.filters.recursive_filter(series, -res.params[1:2])
filtered = sm.tsa.filters.convolution_filter(tmp, np.r_[1, -res.params[:1]], nsides=1)

print(pd.DataFrame({
    'ARIMA resid': res.resid,
    'arma_innovations': arma_innovations.arma_innovations(
        series, ar_params=res.params[:1], ma_params=res.params[1:2])[0],
    'combined filters': filtered}))

gives:

    ARIMA resid  arma_innovations    combined filters
0      0.496714          0.496714                 NaN
1     -0.134253         -0.134253           -0.136915
2      0.668094          0.668094            0.668246
3      1.507288          1.507288            1.507279
4     -0.193560         -0.193560           -0.193559
..          ...               ...                 ...
95    -1.448784         -1.448784           -1.448784
96     0.268421          0.268421            0.268421
97     0.212966          0.212966            0.212966
98     0.046281          0.046281            0.046281
99    -0.244725         -0.244725           -0.244725

SARIMA(1, 0, 1)x(1, 0, 0, 3)

The seasonal model is a little more complicated, because it requires multiplying lag polynomials. For additional details, see this example notebook from the Statsmodels documentation.

np.random.seed(42)
ar_poly = [1, -0.5]
sar_poly = [1, 0, 0, -0.1]
ar = np.polymul(ar_poly, sar_poly)
series = sm.tsa.arma_generate_sample(ar=ar, ma=(1, 0.2), nsample=100)

res = sm.tsa.arima.ARIMA(series, order=(1, 0, 1), seasonal_order=(1, 0, 0, 3), trend='n').fit()
a = res.resid

# Apply the recursive then convolution filter
tmp = sm.tsa.filters.recursive_filter(series, -res.polynomial_reduced_ma[1:])
filtered = sm.tsa.filters.convolution_filter(tmp, res.polynomial_reduced_ar, nsides=1)

print(pd.DataFrame({
    'ARIMA resid': res.resid,
    'arma_innovations': arma_innovations.arma_innovations(
        series, ar_params=-res.polynomial_reduced_ar[1:],
        ma_params=res.polynomial_reduced_ma[1:])[0],
    'combined filters': filtered}))

gives:

    ARIMA resid  arma_innovations combined filters
0      0.496714          0.496714              NaN
1     -0.100303         -0.100303              NaN
2      0.625066          0.625066              NaN
3      1.557418          1.557418              NaN
4     -0.209256         -0.209256        -0.205201
..          ...               ...              ...
95    -1.476702         -1.476702        -1.476702
96     0.269118          0.269118         0.269118
97     0.230697          0.230697         0.230697
98    -0.004561         -0.004561        -0.004561
99    -0.233466         -0.233466        -0.233466

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM