简体   繁体   English

R中t.test函数的数学公式

[英]Mathematical formula underlying t.test function in R

does someone know which mathematical formula is used by R to perform the t-test?有人知道R使用哪个数学公式来执行t检验吗?

If I look on Internet, if find different possible mathematical formulas: I tried to my t-test "by-hand" with the 3 following formulas (that you can see in the 2 following pictures):如果我在互联网上查找,如果找到不同的可能数学公式:我尝试使用以下 3 个公式“手动”进行 t 检验(您可以在以下 2 个图片中看到):

在此处输入图像描述

在此处输入图像描述

Yet, if, I compare the p-values that i find with the one that the t.test function in R gives me, I get very different results.然而,如果我将找到的 p 值与 R 中的 t.test 函数给我的 p 值进行比较,我会得到非常不同的结果。 -> When I try with the 3 different formulas, I get non-significant p-values each time. -> 当我尝试使用 3 个不同的公式时,每次都会得到不显着的 p 值。 -> When I try with the r function,I get a significant p-value. -> 当我尝试使用 r 函数时,我得到了一个显着的 p 值。

Here is my code:这是我的代码:

#Loading the data
library("lingpsych")
data("df_gibsonwu")

#Preparing the data
bysubj <- aggregate(rt~subj + type, mean, data=df_gibsonwu)
ORSR <- bysubjOR$rt - bysubjSR$rt
meanOR <- mean(bysubjOR$rt)
meanSR <- mean(bysubjSR$rt)

# Paired t-test by hand:
n <- 37

SE0 <- sqrt(((sd(bysubjOR$rt)^2) /n)+((sd(bysubjSR$rt)^2)/n))
SE1 <- sqrt(((n * sum(ORSR))^2 - (sum(ORSR)))^2 / n-1)
SE2 <- sd(bysubj$rt) / sqrt(n)

tvalue0 <- (meanOR - meanSR) / SE0
tvalue1 <- (sum(ORSR)) / SE1 #Which I think, is the correct formula for a paired-t-test?
tvalue2 <- (meanOR - meanSR) / SE2

critT <- qt(0.975,n-1)
pvalue0 <- 2*(pt(tvalue0, n-1, lower.tail=FALSE))
pvalue1 <- 2*(pt(tvalue1, n-1, lower.tail=FALSE))
pvalue2 <- 2*(pt(tvalue2, n-1, lower.tail=FALSE))

# Paired t-test using the R function:
t.test(bysubj$rt ~ bysubj$type, paired=TRUE)

The results are the following: pvalue0 = 1.959, pvalue1 = 1.000, pvalue2 = 1.994,结果如下:pvalue0 = 1.959, pvalue1 = 1.000, pvalue2 = 1.994,

p-value with the r function: 0.01248 r 函数的 p 值:0.01248

Thank you in advance for your help!预先感谢您的帮助! :) :)

You need to make sure you rearrange/pair up your data properly.您需要确保正确重新排列/配对数据。

bysubj <- aggregate(rt~subj + type, mean, data=df_gibsonwu)
dd <- data.frame(obj = bysubj[bysubj$type=="obj-ext", "rt"],
                 subj = bysubj[bysubj$type=="subj-ext", "rt"])

calculate x diffs, variances, n by group按组计算 x 差异、方差、n

xdiff <- dd[,1] - dd[,2]
n <- nrow(dd)

t statistic (according to formula above) t 统计量(根据上面的公式)

tstat <- mean(xdiff)/sqrt(var(xdiff)/n)
## -2.63007

As @rawr suggested in comments, I got this from looking at the code of stats:::t.test.default正如@rawr 在评论中建议的那样,我通过查看stats:::t.test.default的代码得到了这个

calculating the p-value计算 p 值

using 2*pt(abs(), ..., lower.tail = FALSE) gets us the two-tailed p-value for either a negative or a positive t-statistic.使用2*pt(abs(), ..., lower.tail = FALSE)可以得到负或正 t 统计量的双尾 p 值。

df <- n - 1
2*pt(abs(tstat), df, lower.tail  = FALSE)
## 0.01248

With t.test() :使用t.test()

t.test(dd$subj, dd$obj, paired = TRUE)
## t = 2.6301, df = 36, p-value = 0.01248

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM