简体   繁体   English

R Function 获得均值差的置信区间

[英]R Function to get Confidence Interval of Difference Between Means

I am trying find a function that allows me two easily get the confidence interval of difference between two means.我正在尝试找到一个 function ,它可以让我两个轻松获得两种方法之间差异的置信区间。

I am pretty sure t.test has this functionality, but I haven't been able to make it work.我很确定t.test具有此功能,但我无法使其工作。 Below is a screenshot of what I have tried so far:以下是我迄今为止尝试过的截图:

Image图片

This is the dataset I am using这是我正在使用的数据集

   Indoor Outdoor
1    0.07    0.29
2    0.08    0.68
3    0.09    0.47
4    0.12    0.54
5    0.12    0.97
6    0.12    0.35
7    0.13    0.49
8    0.14    0.84
9    0.15    0.86
10   0.15    0.28
11   0.17    0.32
12   0.17    0.32
13   0.18    1.55
14   0.18    0.66
15   0.18    0.29
16   0.18    0.21
17   0.19    1.02
18   0.20    1.59
19   0.22    0.90
20   0.22    0.52
21   0.23    0.12
22   0.23    0.54
23   0.25    0.88
24   0.26    0.49
25   0.28    1.24
26   0.28    0.48
27   0.29    0.27
28   0.34    0.37
29   0.39    1.26
30   0.40    0.70
31   0.45    0.76
32   0.54    0.99
33   0.62    0.36

and I have been trying to use t.test function that has been installed from我一直在尝试使用t.test function 从

install.packages("ggpubr")

I am pretty new to R, so sorry if there is a simple answer to this question.我对 R 很陌生,如果这个问题有一个简单的答案,我很抱歉。 I have searched around quite a bit and haven't been able to find anything that I am looking for.我已经搜索了很多,但找不到任何我想要的东西。

Note: The output I am looking for is Between -1.224 and 0.376注意:我正在寻找的 output 介于 -1.224 和 0.376 之间

Edit:编辑:

The CI of difference between means I am looking for is if a random 34th datapoint was added to the chart by picking a random value in the Indoor column and a random value in the Outdoor column and duplicating it.我正在寻找的平均值之间的差异 CI 是,是否通过在 Indoor 列中选择一个随机值和在 Outdoor 列中选择一个随机值并复制它来将随机的第 34 个数据点添加到图表中。 Running the t.test will output the correct CI for the difference of means for the given sample size of 33.运行t.test将 output 对于给定样本大小 33 的均值差异的正确 CI。

How can I go about doing this pretending the sample size is 34?假装样本量为 34,我该如何做 go?

there's probably something more convenient in the standard library, but it's pretty easy to calculate.标准库中可能有更方便的东西,但计算起来很容易。 given your df variable, we can just do:给定您的df变量,我们可以这样做:

# calculate mean of difference
d_mu <- mean(df$Indoor) - mean(df$Outdoor)
# calculate SD of difference
d_sd <- sqrt(var(df$Indoor) + var(df$Outdoor))

# calculate 95% CI of this
d_mu + d_sd * qt(c(0.025, 0.975), nrow(df)*2)

giving me: -1.2246 0.3767给我: -1.2246 0.3767

mostly for @AkselA: I often find it helpful to check my work by sampling simpler distributions, in this case I'd do something like:主要用于@AkselA:我经常发现通过采样更简单的分布来检查我的工作很有帮助,在这种情况下,我会做类似的事情:

a <- mean(df$Indoor) + sd(df$Indoor) * rt(1000000, nrow(df)-1)
b <- mean(df$Outdoor) + sd(df$Outdoor) * rt(1000000, nrow(df)-1)
quantile(a - b, c(0.025, 0.975))

which gives me answers much closer to the CI I gave in the comment这让我的答案更接近我在评论中给出的 CI

Even though I always find the approach of manually calculating the results, as shown by @Sam Mason, the most insightful, there are some who want a shortcut.尽管我总是找到手动计算结果的方法,如@Sam Mason 所示,最有见地,但有些人想要捷径。 And sometimes, it's also ok to be lazy:)有时候,偷懒也没关系:)

So among the different ways to calculate CIs, this is imho the most comfortable:因此,在计算 CI 的不同方法中,恕我直言,这是最舒服的:

DescTools::MeanDiffCI(Indoor, Outdoor)

Here's a reprex:这是一个代表:

IV <- diamonds$price
DV <- rnorm(length(IV), mean = mean(IV), sd = sd(IV))
DescTools::MeanDiffCI(IV, DV)

gives

 meandiff    lwr.ci    upr.ci 
-18.94825 -66.51845  28.62195 

This is calculated with 999 bootstrapped samples by default.默认情况下,这是使用 999 个自举样本计算得出的。 If you want 1000 or more, you can just add that in the argument R :如果你想要 1000 或更多,你可以在参数R中添加:

DescTools::MeanDiffCI(IV, DV, R = 1000)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM