[英]floating-point error fligner.test r function?
I have noticed that using the statistical test fligner.test
from the r stat
package provides different results with a simple transformation, even though this shouldn't be the case.我注意到使用来自 r stat
package 的统计测试fligner.test
通过简单的转换提供了不同的结果,即使情况并非如此。
Here an example (the difference for the original dataset is much more dramatic):这是一个示例(原始数据集的差异更为显着):
g <- factor(rep(1:2, each=6))
x1 <- c(2,2,6,6,1,4,5,3,5,6,5,5)
x2 <- (x1-1)/5 #> cor(x1,x2) [1] 1
fligner.test(x1,g) # chi-squared = 4.2794, df = 1, p-value = 0.03858
fligner.test(x2,g) # chi-squared = 4.8148, df = 1, p-value = 0.02822
Looking at the function code, I have noticed that the median centering might be causing the issue:查看 function 代码,我注意到中值居中可能会导致问题:
x1 <- x1 - tapply(x1,g,median)[g]
x2 <- x2 - tapply(x2,g,median)[g]
unique(abs(x1)) # 1 3 2 0
unique(abs(x2)) # 0.2 0.6 0.4 0.2 0.0 <- repeated 0.2
Is this a known issue, and how should this inconsistency be resolved?这是一个已知问题,应该如何解决这种不一致?
I think your analysis is correct here.我认为你的分析在这里是正确的。 In your example the problem ultimately occurs because (0.8 - 0.6) == 0.2
is FALSE
unless rounded to 15 decimal places.在您的示例中,问题最终会出现,因为(0.8 - 0.6) == 0.2
是FALSE
,除非四舍五入到小数点后 15 位。 You should file a bug report, since this is avoidable.您应该提交错误报告,因为这是可以避免的。
If you are desperate in the meantime, you can adapt stats:::fligner.test.default
by applying a tiny bit of rounding at the median centering stage to remove floating point inequalities:如果你同时绝望,你可以调整stats:::fligner.test.default
通过在中间居中阶段应用一点点舍入来消除浮点不等式:
fligner <- function (x, g, ...)
{
if (is.list(x)) {
if (length(x) < 2L)
stop("'x' must be a list with at least 2 elements")
DNAME <- deparse1(substitute(x))
x <- lapply(x, function(u) u <- u[complete.cases(u)])
k <- length(x)
l <- lengths(x)
if (any(l == 0))
stop("all groups must contain data")
g <- factor(rep(1:k, l))
x <- unlist(x)
}
else {
if (length(x) != length(g))
stop("'x' and 'g' must have the same length")
DNAME <- paste(deparse1(substitute(x)), "and",
deparse1(substitute(g)))
OK <- complete.cases(x, g)
x <- x[OK]
g <- g[OK]
g <- factor(g)
k <- nlevels(g)
if (k < 2)
stop("all observations are in the same group")
}
n <- length(x)
if (n < 2)
stop("not enough observations")
x <- round(x - tapply(x, g, median)[g], 15)
a <- qnorm((1 + rank(abs(x))/(n + 1))/2)
a <- a - mean(a)
v <- sum(a^2)/(n - 1)
a <- split(a, g)
STATISTIC <- sum(lengths(a) * vapply(a, mean, 0)^2)/v
PARAMETER <- k - 1
PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
names(STATISTIC) <- "Fligner-Killeen:med chi-squared"
names(PARAMETER) <- "df"
METHOD <- "Fligner-Killeen test of homogeneity of variances"
RVAL <- list(statistic = STATISTIC, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name = DNAME)
class(RVAL) <- "htest"
return(RVAL)
}
This now returns the correct result for both your vectors:现在,这将为您的两个向量返回正确的结果:
fligner(x1,g)
#>
#> Fligner-Killeen test of homogeneity of variances
#>
#> data: x1 and g
#> Fligner-Killeen:med chi-squared = 4.2794, df = 1, p-value = 0.03858
fligner(x2,g)
#>
#> Fligner-Killeen test of homogeneity of variances
#>
#> data: x2 and g
#> Fligner-Killeen:med chi-squared = 4.2794, df = 1, p-value = 0.03858
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.