简体   繁体   中英

Get Confidence Intervals for Phi statistics using Bootstrapping in R

I want to get the Confidence Interval associated with Phi statistic using bootstrapping(10,000 iterations) in R software.

I am using the "psych" package to calculate the phi statistics. And I'm stuck on how to get CI associated with the phi statistics.

My data and code to get phi statistics are as follows:

library(psych)

Type_of_Cigar = c(rep("0", 16), rep("1", 16))

Cancer = c(rep(c("0", "0", "0", "0"),4),
             rep(c("1", "1", "1", "0"),4))

Table1 <- xtabs(~ Type_of_Cigar + Cancer)

Table1

phi(Table1, digits=5)
#0.7746

You can do this with the boot package. First, save the data as a data frame.

Type_of_Cigar = c(rep("0", 16), rep("1", 16))

Cancer = c(rep(c("0", "0", "0", "0"),4),
           rep(c("1", "1", "1", "0"),4))


dat <- data.frame(Type_of_Cigar = Type_of_Cigar, 
                  Cancer = Cancer)

Then, you need to write a function whose first two arguments are the data and the bootstrapped observation numbers that I'm calling inds . The function should take the data, subset them based on inds and calculate some value, in this case phi. You will need to use the subset data to make any intermediate results (such as tab below).

boot.fun <- function(data, inds){
  tab <- xtabs(~ Type_of_Cigar + Cancer, 
               data=data[inds, ])
  psych::phi(tab)
}

Then, you can call boot() for the original data and the function you wrote above.

library(boot)
out <- boot(dat, statistic=boot.fun, R=10000)

Then, you can use the boot.ci() function to calculate confidence intervals:

boot.ci(out)
# BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
# Based on 10000 bootstrap replicates
# 
# CALL :
#   boot.ci(boot.out = out)
# 
# Intervals :
#   Level      Normal              Basic
# 95%   ( 0.5812,  0.9509 )   ( 0.6000,  0.9500 )
# 
# Level     Percentile            BCa
# 95%   ( 0.59,  0.94 )   ( 0.52,  0.92 )
# Calculations and Intervals on Original Scale
# Warning message:
#   In boot.ci(out) : bootstrap variances needed for studentized intervals

Based on the comment below, I should say that the booth the percentiles and BCa (bias-corrected accelerated) intervals are intervals that rely on the values in the bootstrap sampling distribution. For a 95% confidence interval, the percentile interval orders the bootstrap statistics and takes the 2.5th and 97.5th percentile values as the confidence interval. The BCa interval identifies different percentiles that account for bias and non-normality in the bootstrap distribution. These are not necessarily the 2.5th and 97.5th percentiles, but the interval will have approximately 95% coverage. Both percentile and BCa intervals are also transformation-respecting. That is to say for some parameter p with confidence bounds p1 and p2 , you can obtain confidence intervals on a transformation f(p) by transforming the confidence bounds by the same with the same function f(p1) and f(p2) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM