简体   繁体   English

R:使用模拟计算p值

[英]R : Calculating p-value using simulations

I wrote this code to run a test statistic on two randomly distributed observations x and y 我编写了这段代码,以对两个随机分布的观测值x和y进行测试统计

mean.test <- function(x, y, B=10000,
alternative=c("two.sided","less","greater"))
{
p.value <- 0
alternative <- match.arg(alternative)
s <- replicate(B, (mean(sample(c(x,y), B, replace=TRUE))-mean(sample(c(x,y), B, replace=TRUE))))
t <- mean(x) - mean(y) 
p.value <- 2*(1- pnorm(abs(quantile(T,0.01)), mean = 0, sd = 1, lower.tail = 
TRUE, log.p = FALSE))   #try to calculate p value 
data.name <- deparse(substitute(c(x,y)))
names(t) <- "difference in means"
zero <- 0
names(zero) <- "difference in means"
return(structure(list(statistic = t, p.value = p.value,
method = "mean test", data.name = data.name,
observed = c(x,y), alternative = alternative,
null.value = zero),
class = "htest"))
}

the code uses a Monte-Carlo simulations to generate the distribution function of the test statistic mean(x) - mean(y) and then calculates the p-value, but apparently i miss defined this p-value because for : 该代码使用蒙特卡洛模拟生成测试统计量均值(x)-均值(y)的分布函数,然后计算p值,但显然我错过了定义该p值的原因,因为:

> set.seed(0)
> mean.test(rnorm(1000,3,2),rnorm(2000,4,3)) 

the output should look like: 输出应如下所示:

    mean test
data: c(rnorm(1000, 3, 2), rnorm(2000, 4, 3))
difference in means = -1.0967, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0

but i got this instead: 但是我却得到了这个:

      mean test
data:  c(rnorm(1000, 3, 2), rnorm(2000, 4, 3))
difference in means = -1.0967, p-value = 0.8087
alternative hypothesis: true difference in means is not equal to 0

can someone explain the bug to me ? 有人可以向我解释这个错误吗?

As far as I can tell, your code has numerous mistakes and errors in it: 据我所知,您的代码中有很多错误和错误:

  • quantile(T, 0.01) - here T == TRUE , so you're calculating the quantile of 1. quantile(T, 0.01) -这里T == TRUE ,因此您正在计算1。
  • The object s is never used. 永不使用对象s
  • mean(sample(c(x,y), B, replace=TRUE)) What are you trying to do here? mean(sample(c(x,y), B, replace=TRUE))您想在这里做什么? The c() function combines x and y . c()函数组合xy Sampling makes no sense since you don't know what population they come from 采样没有意义,因为您不知道它们来自什么人群
  • When you calculate the test statistic t , it should depend on the variance (and sample size). 在计算检验统计量t ,它应取决于方差(和样本量)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM