简体   繁体   English

培生的卡方检验Python

[英]Pearson's Chi Square Test Python

I have two arrays that I would like to do a Pearson's Chi Square test (goodness of fit). 我有两个阵列,我想进行皮尔逊卡方检验(拟合优度)。 I want to test whether or not there is a significant difference between the expected and observed results. 我想测试预期结果与观察结果之间是否有显着差异。

observed = [11294, 11830, 10820, 12875]
expected = [10749, 10940, 10271, 11937]

I want to compare 11294 with 10749, 11830 with 10940, 10820 with 10271, etc. 我想比较11294和10749、11830和10940、10820和10271等。

Here's what I have 这就是我所拥有的

>>> from scipy.stats import chisquare
>>> chisquare(f_obs=[11294, 11830, 10820, 12875],f_exp=[10749, 10940, 10271, 11937])
(203.08897607453906, 9.0718379533890424e-44)

where 203 is the chi square test statistic and 9.07e-44 is the p value. 其中203是卡方检验统计量,而9.07e-44是p值。 I'm confused by the results. 我对结果感到困惑。 p-value = 9.07e-44 < 0.05 therefore we reject the null hypothesis and conclude that there is a significant difference between the observed and expected results. p值= 9.07e-44 <0.05,因此我们拒绝原假设,并得出结论,观察到的结果与预期结果之间存在显着差异。 This isn't correct because the numbers are so close. 这是不正确的,因为数字非常接近。 How do I fix this? 我该如何解决?

In general, the null hypothesis(H0) says that the two variable(X and Y) are independent, ie changing values in X wouldn't affect values in Y. 通常,零假设(H0)表示两个变量(X和Y)是独立的,即更改X中的值不会影响Y中的值。

For example, X = [1,2,3,4] and Y = [2,4,6,8] 例如,X = [1,2,3,4],Y = [2,4,6,8]

If you calculate the "p-value" using any method out there for this case, it should come out to be a very small value, implying that there is a very low chance of this case following the null hypothesis, ie a very low chance that X and Y are independent of each other. 如果您使用这种情况下的任何方法来计算“ p值”,则该值应该很小,这意味着遵循原假设后这种情况的可能性很小,即可能性非常低X和Y彼此独立。

It means it will never follow the Null Hypothesis here and these two variables are dependent on each other, in a form of Y = 2X. 这意味着它将永远不会遵循零假设,并且这两个变量以Y = 2X的形式相互依赖。

In your case also, p-value score of 9.0718379533890424e-44 means the same thing, ie small value indicates that there is a very low chance it would suffice the null hypothesis and it means that observed and expected are related to each other and there is no independence between them. 同样在您的情况下,p值分数9.0718379533890424e-44表示相同的内容,即,较小的值表示满足原假设的机会很小,并且意味着观察到的 期望与彼此相关,并且存在他们之间没有独立性

Ps. PS。 You are correct about this. 您对此是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM