来自使用 Scipy 的卡方检验的 P 值

Question

我正在计算一个测试统计量，该统计量分布为一个自由度为 1 的卡方。 我还使用来自scipy.stats两种不同技术计算与此相对应的 P 值。

我有观察值和期望值作为 numpy 数组。

observation = np.array([  9.21899399e-04,   4.04363991e-01,   3.51713820e-02,
         3.00816946e-03,   1.80976731e-03,   6.46172153e-02,
         8.61549065e-05,   9.41395390e-03,   1.00946008e-03,
         1.25621846e-02,   1.06806251e-02,   6.66856795e-03,
         2.67380732e-01,   0.00000000e+00,   1.60859798e-02,
         3.63681803e-01,   1.06230978e-05])

expectation = np.array([ 0.07043956,  0.07043956,  0.07043956,  0.07043956,  0.07043956,
        0.07043956,  0.07043956,  0.07043956,  0.07043956,  0.07043956,
        0.07043956,  0.07043956,  0.07043956,  0.07043956,  0.07043956,
        0.07043956,  0.07043956])

对于第一种方法，我参考了这篇stackoverflow 帖子。 以下是我在第一种方法中所做的：

from scipy import stats

chi_sq = np.sum(np.divide(np.square(observation - expectation), expectation)) 
p_value = 1 - stats.chi2.cdf(chi_sq, 1)

print(chi_sq, p_value)

>> (4.1029225303927959, 0.042809154353783851)

在第二种方法中，我使用了来自spicy.stats chi-square方法。 更具体地说，我正在使用此链接。 这就是我实施第二种方法的方式。

from scipy import stats
print( stats.chisquare(f_obs=observation, f_exp=expectation, ddof=0) )

>> Power_divergenceResult(statistic=4.1029225303927959, pvalue=0.99871467077385223)

我在两种方法中都得到了相同的卡方统计值（即 statistic=4.1029225303927959），但 p 值不同。 在第一种方法中，我得到p_value=0.042809154353783851 。 在第二种方法中，我得到pvalue=0.99871467077385223 。

为什么我在两种方法中没有得到相同的 p 值？ 谢谢。

Answer 1

对于 stats.chisquare，ddof 定义为

ddofint, optional
“Delta degrees of freedom”: adjustment to the degrees of freedom for the p-value. 
The p-value is computed using a chi-squared distribution with 
k - 1 - ddof degrees of freedom, 
where k is the number of observed frequencies. The default value of ddof is 0.

您所做的基本上是Pearson 卡方检验，自由度为 k-1 ，其中 n 是观察次数。 从我所见，您的期望基本上是观察到的平均值，这意味着您估计了 1 个参数，因此 ddof 正确为 0。但是对于 stats.chi2.cdf， df应该是 16。

所以：

chi_sq = np.sum(np.divide(np.square(observation - expectation), expectation)) 
[1 - stats.chi2.cdf(chi_sq, len(observation)-1),
stats.chisquare(f_obs=observation, ddof=0)[1]]

[0.9987146707738522, 0.9987146706997099]

一个小的差异，但规模或多或少是正确的。

来自使用 Scipy 的卡方检验的 P 值

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-02 12:13:18

来自使用 Scipy 的卡方检验的 P 值

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-02 12:13:18

解决方案1
1 已采纳 2020-04-02 12:13:18