简体   繁体   中英

Statsmodels PACF plot confidence interval does not match PACF function

I have a time series that appears to have a significant lag when observing the partial autocorrelation (PACF) plot, ie PACF value is greater than the blue confidence interval. I wanted to verify this programmatically but it doesn't seem to work.

I plotted the PACF plot with statsmodels time series api, which showed the first lag was significant. So, I used the PACF estimation to get the PACF values along with the confidence interval at each point, but the confidence intervals between the two don't match up. What's even more odd is the plot function in the source code uses the underlying estimation function so they should both match up.

Example:

import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

x = np.arange(1000) 
sm.graphics.tsa.plot_pacf(x)
plt.show()

在此处输入图像描述

Which shows the first lag is quite significant that is ~0.98 and the confidence interval (blue rectangle) is about (-0.06, 0.06) throughout the plot.

Alternatively, when trying to get these exact plot values (only getting first 10 lags for brevity):

sm.tsa.stattools.pacf(x, nlags=10, alpha=0.05) 

The resulting PACF values are (which match the above plot):

array([ 1.        ,  0.997998  , -0.00200201, -0.00200402, -0.00200605,
        -0.0020081 , -0.00201015, -0.00201222, -0.0020143 , -0.00201639,
        -0.00201849])

And the confidence interval (shown in blue in the above graph), seems off for the first lag:

 array([[ 1.        ,  1.        ],
        [ 0.93601849,  1.0599775 ],
        [-0.06398151,  0.0599775 ],
        [-0.06398353,  0.05997548],
        [-0.06398556,  0.05997345],
        [-0.0639876 ,  0.05997141],
        [-0.06398965,  0.05996935],
        [-0.06399172,  0.05996729],
        [-0.0639938 ,  0.05996521],
        [-0.06399589,  0.05996312],
        [-0.06399799,  0.05996101]]))

What's going on?

Api Reference:

according to the code:

  • stattools.pacf computes the confidence interval around the estimated pacf, ie it's centered at the actual value
  • graphics.tsa.plot_pacf takes that confidence interval and subtracts the estimated pacf, So the confidence interval is centered at zero.

I don't know or remember why it was done this way.

In the example all pacf for lags larger or equal to 2 are close to zero, so there is no visible difference between plot and the results from stattools.pacf.

The PACF for lag 0 is always 1 (see eg here ), and hence its confidence interval is [1,1].

This is ensured by the last line of the code snippet where the CI is calculated:

varacf = 1. / len(x)  # for all lags >=1
interval = stats.norm.ppf(1. - alpha / 2.) * np.sqrt(varacf)
confint = np.array(lzip(ret - interval, ret + interval))
confint[0] = ret[0]  # fix confidence interval for lag 0 to varpacf=0

(See also issue 1969 where this was fixed).

As the 0 lag is of no interest you usually make the PACF plot start from lag 1 (as in R's pacf function ). This can be achieved by zero=False :

sm.graphics.tsa.plot_pacf(x, ax=axes[0], zero=True, title='zero=True (default)')
sm.graphics.tsa.plot_pacf(x, ax=axes[1], zero=False, title='zero=False')

在此处输入图像描述

if I understood initial question correctly - why the CI numbers returned by ACF/PACF function does not match CI shown on graph (made by function plot_acf)? Answer is simple - CI on graph is centered around 0, it uses the ~same numbers that you get from acf/pacf functions.

I still do not follow the answer. From looking at my own data, I understand that the graph is centered around zero, but portrays values as-is. Isn't that just mushing two different scales into one? Shouldn't you choose 1: either raw values against raw CI (block 1), or treat value as 0 with CI centered around zero (block 2)?

Image below illustrates my point:

First block: statsmodels.tsa.stattools.acf(df, nlags=10, alpha=0.05, fft=True).

Second block: LCL-value and UCL-value have value substracted, comparison with 0.

Third block: Match the graph sm.graphics.tsa.plot_acf(df, zero=False, lags = 10, alpha=0.05) would show: adjusted LCL and UCL, but raw value.

As you can see, the "raw" way there are no significant results (eval, eval_w_0), but I get significant results from the graph (eval_adj).

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM