在 R 中使用 Bootstrapping 获取 Phi 统计数据的置信区间

Question

I want to get the Confidence Interval associated with Phi statistic using bootstrapping(10,000 iterations) in R software.我想在 R 软件中使用引导程序（10,000 次迭代）获得与 Phi 统计相关的置信区间。

I am using the "psych" package to calculate the phi statistics.我正在使用“psych”package 来计算 phi 统计数据。 And I'm stuck on how to get CI associated with the phi statistics.我一直在研究如何让 CI 与 phi 统计数据相关联。

My data and code to get phi statistics are as follows:我的数据和获取phi统计的代码如下：

library(psych)

Type_of_Cigar = c(rep("0", 16), rep("1", 16))

Cancer = c(rep(c("0", "0", "0", "0"),4),
             rep(c("1", "1", "1", "0"),4))

Table1 <- xtabs(~ Type_of_Cigar + Cancer)

Table1

phi(Table1, digits=5)
#0.7746

Answer 1

You can do this with the boot package. First, save the data as a data frame.你可以用boot package来做到这一点。首先，将数据保存为数据框。

Type_of_Cigar = c(rep("0", 16), rep("1", 16))

Cancer = c(rep(c("0", "0", "0", "0"),4),
           rep(c("1", "1", "1", "0"),4))


dat <- data.frame(Type_of_Cigar = Type_of_Cigar, 
                  Cancer = Cancer)

Then, you need to write a function whose first two arguments are the data and the bootstrapped observation numbers that I'm calling inds .然后，您需要编写一个 function，其前两个 arguments 是数据和我调用inds的自举观测值。 The function should take the data, subset them based on inds and calculate some value, in this case phi. function 应该获取数据，根据inds对它们进行子集化并计算一些值，在本例中为 phi。 You will need to use the subset data to make any intermediate results (such as tab below).您将需要使用子集数据来生成任何中间结果（例如下面的tab ）。

boot.fun <- function(data, inds){
  tab <- xtabs(~ Type_of_Cigar + Cancer, 
               data=data[inds, ])
  psych::phi(tab)
}

Then, you can call boot() for the original data and the function you wrote above.然后，您可以为原始数据和上面编写的 function 调用boot() 。

library(boot)
out <- boot(dat, statistic=boot.fun, R=10000)

Then, you can use the boot.ci() function to calculate confidence intervals:然后，您可以使用boot.ci() function 来计算置信区间：

boot.ci(out)
# BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
# Based on 10000 bootstrap replicates
# 
# CALL :
#   boot.ci(boot.out = out)
# 
# Intervals :
#   Level      Normal              Basic
# 95%   ( 0.5812,  0.9509 )   ( 0.6000,  0.9500 )
# 
# Level     Percentile            BCa
# 95%   ( 0.59,  0.94 )   ( 0.52,  0.92 )
# Calculations and Intervals on Original Scale
# Warning message:
#   In boot.ci(out) : bootstrap variances needed for studentized intervals

Based on the comment below, I should say that the booth the percentiles and BCa (bias-corrected accelerated) intervals are intervals that rely on the values in the bootstrap sampling distribution.根据下面的评论，我应该说百分位数和 BCa（偏差校正加速）间隔是依赖于引导抽样分布中的值的间隔。 For a 95% confidence interval, the percentile interval orders the bootstrap statistics and takes the 2.5th and 97.5th percentile values as the confidence interval.对于 95% 的置信区间，百分位数区间对 bootstrap 统计量进行排序，并将第 2.5 个和第 97.5 个百分位数的值作为置信区间。 The BCa interval identifies different percentiles that account for bias and non-normality in the bootstrap distribution. BCa 区间识别不同的百分位数，这些百分位数解释了 bootstrap 分布中的偏差和非正态性。 These are not necessarily the 2.5th and 97.5th percentiles, but the interval will have approximately 95% coverage.这些不一定是第 2.5 个和第 97.5 个百分位数，但区间将具有大约 95% 的覆盖率。 Both percentile and BCa intervals are also transformation-respecting.百分位数和 BCa 间隔也都与转换相关。 That is to say for some parameter p with confidence bounds p1 and p2 , you can obtain confidence intervals on a transformation f(p) by transforming the confidence bounds by the same with the same function f(p1) and f(p2) .也就是说，对于具有置信界限p1和p2的某些参数p ，您可以通过使用相同的 function f(p1)和f(p2)转换置信界限来获得转换f(p)的置信区间。

在 R 中使用 Bootstrapping 获取 Phi 统计数据的置信区间

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-08-31 22:22:39

在 R 中使用 Bootstrapping 获取 Phi 统计数据的置信区间

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-08-31 22:22:39

解决方案1
2 已采纳 2020-08-31 22:22:39