简体   繁体   English

R中的P值故障

[英]P-value Troubles in R

I have a question regarding p-values. 我对p值有疑问。 I've been comparing different linear models to determine if one model is better than another with the following function in R. 我一直在比较不同的线性模型,以确定一个模型是否比另一个更好,在R中具有以下功能。

 anova(model1,model2)

Unfortunately, occasionally it will not calculate an F or a p-value. 不幸的是,偶尔它不会计算F或p值。 Here is an example of an anova summary that did not give a p-value 这是一个没有给出p值的anova摘要的例子

 Analysis of Variance Table

 Model 1: Influence ~ SortedSums[, Combos2[1, A]] + SortedSums[, Combos2[2,A]]
 Model 2: Influence ~ SortedSums[, B]
    Res.Df   RSS Df Sum of Sq F Pr(>F)
 1    127 3090.9                      
 2    128 2655.2 -1    435.74 

For the sake of symmetry, here is also an anova summary that did yield a p-value. 为了对称性,这里也是一个产生p值的anova总结。

 Analysis of Variance Table

 Model 1: Influence ~ SortedSums[, Combos2[1, A]] + SortedSums[, Combos2[2,A]]
 Model 2: Influence ~ SortedSums[, B]
    Res.Df    RSS Df Sum of Sq      F Pr(>F)
  1    127 3090.9                           
  2    128 3157.6 -1   -66.652 2.7386 0.1004

Do you know why this occurs? 你知道为什么会这样吗?

Not all questions require code examples. 并非所有问题都需要代码示例。 You don't deserve to be snarked at for being new, and I'm sorry people did. 你不应该因为新事而被嗤之以鼻,我很抱歉人们做到了。 Here is the answer: 这是答案:

The difference between the two models is not significant. 两种模型之间的差异并不显着。

Here is what you can do about it: 以下是您可以做的事情:

  • Check to make sure that the terms of one model object are a superset of the terms of the other. 检查以确保一个模型对象的术语是另一个模型对象的术语的超集。 Otherwise, the default anova test is invalid to begin with (you could instead compare such non-nested models using AIC, but that belongs in a separate question). 否则,默认的anova测试开始时无效(您可以使用AIC比较这些非嵌套模型,但这属于一个单独的问题)。 I'm actually really curious to see a nested pair of models that manages to be that non-significant, but again, it's not necessary to answering this question. 实际上,我真的很好奇,想看看嵌套对模型管理成为显著,但再次,它没有必要回答这个问题。
  • If you checked, and the models are nested, and this is analysis you are doing manually, write p=1.0 in your report and call it a day. 如果您选中了,并且模型是嵌套的,并且这是您手动执行的分析,请在报表中写入p = 1.0并将其称为一天。
  • If the models are nested, and the above feels like cheating, here's how to do it th hard way. 如果模型是嵌套的,而上面的感觉就像是作弊,那么这就是如何做到这一点。 What you are really asking anova is whether that one variable by which they differ makes a significant contribution to fit. 你真正要问anova是,它们的不同之处是否对拟合有显着贡献。 Take the "larger" model and do summary(BAR) . 采取“更大”的模型并做summary(BAR) The p-value corresponding to the variable present in BAR but missing in FOO is your p-value! 对应于BAR存在但在FOO缺失的变量的p值是您的p值! And it's probably equal to 1. And the square of the t-statistic is the F-value. 并且它可能等于1.并且t统计量的平方是F值。
  • If the models are nested and this is analysis you are doing programmatically and the absence of a p-value breaks stuff elsewhere in your script, just do anova(FOO,BAR)[,5:6] to get NA s instead of blanks... but then again, if you were doing it programmatically you would have already tried that. 如果模型是嵌套的并且这是分析你正在以编程方式进行并且缺少p值会破坏脚本中的其他内容,只需执行anova(FOO,BAR)[,5:6]来获取NA而不是空白。 ..但话又说回来,如果你以编程方式进行,那你就已经尝试过了。

Good luck! 祝好运!

Recently, I also came across this issue when comparing a segmented linear model (with package segmented) with one breakpoint to a linear model without breakpoints. 最近,在将带有一个断点的分段线性模型(带有分组的分段)与没有断点的线性模型进行比较时,我也遇到了这个问题。 The simple linear model is a nested model, because the part before the breakpoint could span the entire data set. 简单线性模型是嵌套模型,因为断点之前的部分可以跨越整个数据集。

The segmented fit, however, (that I invoked with a lax convergence tolerance for performance reasons) reported a fit, where the residual sums of the more complex segmented model were slightly larger than with the simple linear model. 然而,分段拟合(由于性能原因我调用的收敛容差不大)报告了拟合,其中更复杂的分段模型的残差总和略大于简单线性模型。 Of course, the best fit of the more complex nested model should not have larger residual variance, and the anova function reported a p-value of NA. 当然,更复杂的嵌套模型的最佳拟合不应该具有更大的残差方差,并且anova函数报告了NA的p值。

In this case, clearly, the more complex model was not signifcantly better, p > alpha, eg p=1 在这种情况下,显然,更复杂的模型没有明显更好,p> alpha,例如p = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM