简体繁体 English

解释Python中正态性测试中的p值

[英]Interpretation of p-value in normality tests in Python

原文 2017-02-02 16:24:41 8 2 python/ plotly/ p-value/ kolmogorov-smirnov

I am performing normality tests on my data. 我正在对我的数据进行常态测试。 In general I would expect the data to be approximately normal (normal enough), as supported by a histogram of raw values and QQplot. 一般来说，我希望数据大致正常（足够正常），由原始值和QQ图的直方图支持。 I have performed Kolmogorov-Smirnov and Shapiro-Wilk tests and here is where I get confused. 我已经完成了Kolmogorov-Smirnov和Shapiro-Wilk测试，这里是我感到困惑的地方。 My p-values are nearly 0. Kolmogorov-Smirnov statistic=0.78, p-value=0.0 Shapiro-Wilk statistic = 0.99, p-value=1.2e-05 which would have me believe that I should reject the null hypothesis. 我的p值接近0. Kolmogorov-Smirnov统计量= 0.78，p值= 0.0 Shapiro-Wilk统计量= 0.99，p值= 1.2e-05这让我相信我应该拒绝零假设。 I was going to assume that this is due to the fact that my mean and standard deviation are different to 0 and 1 respetively assumed for the KS test, as explained here but then stumbled across the tutorial on normality test in plotly, where for both tests the low p-values apparently support the null hypothesis! 我会认为这是由于这样的事实，我的平均值和标准偏差是不同的0和1 respetively承担了KS检验，如解释在这里，但随后在整个教程中无意中发现了正态性检验plotly，在这两个测试低p值显然支持零假设！ plotly tutorial on normality tests Has anything been changed in the way the tests are being performed? 正常性测试的情节教程测试的执行方式有什么变化吗？ Or is it an error on the tutorial's page? 或者是教程页面上的错误？

2 个解决方案

It seems to be an error in the tutorial. 这似乎是教程中的错误。 As they state (classical definition), the null hypothesis is that there is no significant difference between the reference distribution and the tested one. 正如他们所说（经典定义），零假设是参考分布和测试分布之间没有显着差异。 This hypothesis should be rejected when the p-value is smaller that your threshold (when the test statistic is greater than the critical value). 当p值小于阈值时（当检验统计量大于临界值时），应该拒绝该假设。 This is also stated in the same tutorial in the link where they give more information about how to accept or reject the null hypothesis. 这也在链接中的相同教程中说明，其中它们提供了有关如何接受或拒绝原假设的更多信息。

Therefore I believe it is an error. 因此我认为这是一个错误。 In both examples, the null hypothesis of no difference should be rejected, as the p-values seem to be smaller than 0.05 and the test statistics are greater than their respective critical values. 在两个例子中，应该拒绝无差异的零假设，因为p值似乎小于0.05并且测试统计大于它们各自的临界值。

I just downloaded the data set from Tutorial and played with it by R. I agree with both of you, their conclusions are wrong on both Shapiro and KS tests. 我刚从Tutorial下载了数据集并用R来玩它。我同意你们两个，他们的结论在Shapiro和KS测试中都是错误的。

Moreover, by doing KS test, you should not only use "norm" to suggest distribution, parameter values are needed. 此外，通过进行KS测试，您不仅应该使用“规范”来建议分布，还需要参数值。 Indeed, ks.test(x,"pnorm", mean(x),sd(x)) will give you a p-value of 0.0475. 实际上， ks.test(x,"pnorm", mean(x),sd(x))会给你一个0.0475的p值。 This makes more sense than their claimed "0.0" p-value, because a non-parametric test will be less strict then parametric test on p-value. 这比它们声称的“0.0”p值更有意义，因为非参数测试将比p值的参数测试更不严格。

Adding histogram and qqplot for the dataset as well. 同时为数据集添加直方图和qqplot。