简体   繁体   English

Statsmodels PACF plot 置信区间与 PACF function 不匹配

[英]Statsmodels PACF plot confidence interval does not match PACF function

I have a time series that appears to have a significant lag when observing the partial autocorrelation (PACF) plot, ie PACF value is greater than the blue confidence interval.我有一个时间序列,在观察偏自相关 (PACF) plot 时似乎有明显的滞后,即 PACF 值大于蓝色置信区间。 I wanted to verify this programmatically but it doesn't seem to work.我想以编程方式验证这一点,但它似乎不起作用。

I plotted the PACF plot with statsmodels time series api, which showed the first lag was significant.我用 statsmodels 时间序列 api 绘制了 PACF plot,这表明第一个滞后是显着的。 So, I used the PACF estimation to get the PACF values along with the confidence interval at each point, but the confidence intervals between the two don't match up.因此,我使用PACF 估计来获取 PACF 值以及每个点的置信区间,但两者之间的置信区间不匹配。 What's even more odd is the plot function in the source code uses the underlying estimation function so they should both match up.更奇怪的是源代码中的 plot function使用基础估计 function 所以它们应该匹配。

Example:例子:

import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

x = np.arange(1000) 
sm.graphics.tsa.plot_pacf(x)
plt.show()

在此处输入图像描述

Which shows the first lag is quite significant that is ~0.98 and the confidence interval (blue rectangle) is about (-0.06, 0.06) throughout the plot.这表明第一个滞后非常显着,约为 0.98,整个 plot 的置信区间(蓝色矩形)约为(-0.06,0.06)。

Alternatively, when trying to get these exact plot values (only getting first 10 lags for brevity):或者,在尝试获取这些精确的 plot 值时(为简洁起见,仅获取前 10 个滞后值):

sm.tsa.stattools.pacf(x, nlags=10, alpha=0.05) 

The resulting PACF values are (which match the above plot):生成的 PACF 值为(与上图匹配):

array([ 1.        ,  0.997998  , -0.00200201, -0.00200402, -0.00200605,
        -0.0020081 , -0.00201015, -0.00201222, -0.0020143 , -0.00201639,
        -0.00201849])

And the confidence interval (shown in blue in the above graph), seems off for the first lag:置信区间(在上图中以蓝色显示)似乎在第一个滞后时关闭:

 array([[ 1.        ,  1.        ],
        [ 0.93601849,  1.0599775 ],
        [-0.06398151,  0.0599775 ],
        [-0.06398353,  0.05997548],
        [-0.06398556,  0.05997345],
        [-0.0639876 ,  0.05997141],
        [-0.06398965,  0.05996935],
        [-0.06399172,  0.05996729],
        [-0.0639938 ,  0.05996521],
        [-0.06399589,  0.05996312],
        [-0.06399799,  0.05996101]]))

What's going on?这是怎么回事?

Api Reference: Api 参考:

according to the code:根据代码:

  • stattools.pacf computes the confidence interval around the estimated pacf, ie it's centered at the actual value stattools.pacf计算估计 pacf 周围的置信区间,即它以实际值为中心
  • graphics.tsa.plot_pacf takes that confidence interval and subtracts the estimated pacf, So the confidence interval is centered at zero. graphics.tsa.plot_pacf采用该置信区间并减去估计的 pacf,因此置信区间以零为中心。

I don't know or remember why it was done this way.我不知道也不记得为什么要这样做。

In the example all pacf for lags larger or equal to 2 are close to zero, so there is no visible difference between plot and the results from stattools.pacf.在该示例中,滞后大于或等于 2 的所有 pacf 都接近于零,因此 plot 与 stattools.pacf 的结果之间没有明显差异。

The PACF for lag 0 is always 1 (see eg here ), and hence its confidence interval is [1,1].滞后 0 的 PACF 始终为 1(参见此处的示例),因此其置信区间为 [1,1]。

This is ensured by the last line of the code snippet where the CI is calculated:这是由计算 CI 的代码片段的最后一行确保的:

varacf = 1. / len(x)  # for all lags >=1
interval = stats.norm.ppf(1. - alpha / 2.) * np.sqrt(varacf)
confint = np.array(lzip(ret - interval, ret + interval))
confint[0] = ret[0]  # fix confidence interval for lag 0 to varpacf=0

(See also issue 1969 where this was fixed). (另请参阅问题 1969 ,其中已解决此问题)。

As the 0 lag is of no interest you usually make the PACF plot start from lag 1 (as in R's pacf function ).由于 0 滞后没有意义,您通常使 PACF plot 从滞后 1 开始(如 R 的pacf function )。 This can be achieved by zero=False :这可以通过zero=False来实现:

sm.graphics.tsa.plot_pacf(x, ax=axes[0], zero=True, title='zero=True (default)')
sm.graphics.tsa.plot_pacf(x, ax=axes[1], zero=False, title='zero=False')

在此处输入图像描述

if I understood initial question correctly - why the CI numbers returned by ACF/PACF function does not match CI shown on graph (made by function plot_acf)?如果我正确理解了最初的问题——为什么 ACF/PACF function 返回的 CI 编号与图表上显示的 CI(由 function plot_acf 制作)不匹配? Answer is simple - CI on graph is centered around 0, it uses the ~same numbers that you get from acf/pacf functions.答案很简单——图上的 CI 以 0 为中心,它使用的数字与您从 acf/pacf 函数中获得的数字相同。

I still do not follow the answer.我仍然不遵循答案。 From looking at my own data, I understand that the graph is centered around zero, but portrays values as-is.通过查看我自己的数据,我了解到图表以零为中心,但按原样描绘了值。 Isn't that just mushing two different scales into one?这不就是把两个不同的天平合二为一吗? Shouldn't you choose 1: either raw values against raw CI (block 1), or treat value as 0 with CI centered around zero (block 2)?您不应该选择 1:原始值与原始 CI(块 1),或者将值视为 0,CI 以零为中心(块 2)?

Image below illustrates my point:下图说明了我的观点:

First block: statsmodels.tsa.stattools.acf(df, nlags=10, alpha=0.05, fft=True).第一个块: statsmodels.tsa.stattools.acf(df,nlags=10,alpha=0.05,fft=True)。

Second block: LCL-value and UCL-value have value substracted, comparison with 0.第二块: LCL-value和UCL-value减去值,与0比较。

Third block: Match the graph sm.graphics.tsa.plot_acf(df, zero=False, lags = 10, alpha=0.05) would show: adjusted LCL and UCL, but raw value.第三块:匹配图表 sm.graphics.tsa.plot_acf(df, zero=False, lags = 10, alpha=0.05) 将显示:调整后的 LCL 和 UCL,但原始值。

As you can see, the "raw" way there are no significant results (eval, eval_w_0), but I get significant results from the graph (eval_adj).如您所见,“原始”方式没有显着结果(eval、eval_w_0),但我从图中得到了显着结果(eval_adj)。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM