[英]P-value Troubles in R
I have a question regarding p-values. 我对p值有疑问。 I've been comparing different linear models to determine if one model is better than another with the following function in R.
我一直在比较不同的线性模型,以确定一个模型是否比另一个更好,在R中具有以下功能。
anova(model1,model2)
Unfortunately, occasionally it will not calculate an F or a p-value. 不幸的是,偶尔它不会计算F或p值。 Here is an example of an anova summary that did not give a p-value
这是一个没有给出p值的anova摘要的例子
Analysis of Variance Table
Model 1: Influence ~ SortedSums[, Combos2[1, A]] + SortedSums[, Combos2[2,A]]
Model 2: Influence ~ SortedSums[, B]
Res.Df RSS Df Sum of Sq F Pr(>F)
1 127 3090.9
2 128 2655.2 -1 435.74
For the sake of symmetry, here is also an anova summary that did yield a p-value. 为了对称性,这里也是一个产生p值的anova总结。
Analysis of Variance Table
Model 1: Influence ~ SortedSums[, Combos2[1, A]] + SortedSums[, Combos2[2,A]]
Model 2: Influence ~ SortedSums[, B]
Res.Df RSS Df Sum of Sq F Pr(>F)
1 127 3090.9
2 128 3157.6 -1 -66.652 2.7386 0.1004
Do you know why this occurs? 你知道为什么会这样吗?
Not all questions require code examples. 并非所有问题都需要代码示例。 You don't deserve to be snarked at for being new, and I'm sorry people did.
你不应该因为新事而被嗤之以鼻,我很抱歉人们做到了。 Here is the answer:
这是答案:
The difference between the two models is not significant. 两种模型之间的差异并不显着。
Here is what you can do about it: 以下是您可以做的事情:
anova
is whether that one variable by which they differ makes a significant contribution to fit. anova
是,它们的不同之处是否对拟合有显着贡献。 Take the "larger" model and do summary(BAR)
. summary(BAR)
。 The p-value corresponding to the variable present in BAR
but missing in FOO
is your p-value! BAR
存在但在FOO
缺失的变量的p值是您的p值! And it's probably equal to 1. And the square of the t-statistic is the F-value. anova(FOO,BAR)[,5:6]
to get NA
s instead of blanks... but then again, if you were doing it programmatically you would have already tried that. anova(FOO,BAR)[,5:6]
来获取NA
而不是空白。 ..但话又说回来,如果你以编程方式进行,那你就已经尝试过了。 Good luck! 祝好运!
Recently, I also came across this issue when comparing a segmented linear model (with package segmented) with one breakpoint to a linear model without breakpoints. 最近,在将带有一个断点的分段线性模型(带有分组的分段)与没有断点的线性模型进行比较时,我也遇到了这个问题。 The simple linear model is a nested model, because the part before the breakpoint could span the entire data set.
简单线性模型是嵌套模型,因为断点之前的部分可以跨越整个数据集。
The segmented fit, however, (that I invoked with a lax convergence tolerance for performance reasons) reported a fit, where the residual sums of the more complex segmented model were slightly larger than with the simple linear model. 然而,分段拟合(由于性能原因我调用的收敛容差不大)报告了拟合,其中更复杂的分段模型的残差总和略大于简单线性模型。 Of course, the best fit of the more complex nested model should not have larger residual variance, and the anova function reported a p-value of NA.
当然,更复杂的嵌套模型的最佳拟合不应该具有更大的残差方差,并且anova函数报告了NA的p值。
In this case, clearly, the more complex model was not signifcantly better, p > alpha, eg p=1 在这种情况下,显然,更复杂的模型没有明显更好,p> alpha,例如p = 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.