为 R 中的稳健相关提取可信区间

Question

I currently know how to use pbcor from the WRS2 package to extract robust correlations.我现在知道如何使用pbcor从WRS2包解压到确凿的相关性。 This function calculates the 95% bootstrap confidence intervals around the estimated robust correlation.此函数计算估计的稳健相关性周围的 95% 自举置信区间。 For consistency with the rest of my analyses and manuscript, I need to extract credible intervals instead of confidence intervals.为了与我其余的分析和手稿保持一致，我需要提取可信区间而不是置信区间。

How can I extract the 95% credible intervals , instead of the 95% confidence intervals?如何提取 95%可信区间而不是95% 置信区间？ Is there a way to do this using pbcor ?有没有办法使用pbcor做到这pbcor ？

My dataset contains 210 observations, but here is a subset of the data:我的数据集包含 210 个观察值，但这是数据的一个子集：

Individual  varA    varB
1   2.9380842   0.09896456
2   2.9380842   -1.38772037
3   -0.6879859  -2.41310243
4   -0.6879859  0.55722346
5   -2.3129564  -1.34140699
6   -2.3129564  -1.75604301
7   -0.4937431  0.78381085
8   -0.4937431  0.38320385
9   -0.8558126  0.82125672
10  -0.8558126  0.06346062
11  -0.9211026  -1.67170174

Corresponding code:对应代码：

WRS2::pbcor(data$varA, data$varB, ci=TRUE, nboot=1000, beta=0.1) 
>robust correlation coefficient: 0.275
>test statistic: 0.8582
>p-value:0.41307
>bootstrap CI: [-0.3564; 0.7792]

Answer 1

Hi @Blundering Ecologist嗨@Blundering生态学家

Here is a complete example of estimating Credible Intervals using Bayesian Modeling to compare against the WRS2 based Robust Confidence Intervals: If you use the set.seed you should be able to recreate the data.Your results will be different when you go to the Bayesian part,as it should.这是使用贝叶斯建模来估计可信区间以与基于 WRS2 的稳健置信区间进行比较的完整示例：如果您使用 set.seed，您应该能够重新创建数据。当您转到贝叶斯部分时，结果会有所不同，正如它应该。 My comments are included in the code below.我的评论包含在下面的代码中。

> ## generate data
> set.seed(123)     # for reproducibility
> x <- seq(1:25)+rnorm(25)     
> y <- seq(26:50)-7*rnorm(25)  
> y[10] <- y[10] * 2.5  # introduce outlier in 10th record
> y[20] <- y[20] * 1.5 # introduce outlier in 20th record
> 
> simdat <- cbind(data.frame(x), data.frame(y)) # create data frame
> 
> 
> ## standardize data
> library(robustHD)      # very useful functions standardize() & robStandardize()
Loading required package: ggplot2
Loading required package: perry
Loading required package: parallel
Loading required package: robustbase
> simdat$x_std <- standardize(simdat$x)     # mean and sd
> simdat$x_std_rob <- robStandardize(simdat$x)  # median and MAD
> 
> ## repeat for y
> simdat$y_std <- standardize(simdat$y)     # uses mean and sd
> simdat$y_std_rob <- robStandardize(simdat$y)  # uses median and MAD
> 
> head(simdat) # to see variable names of the standardized data
          x         y      x_std  x_std_rob        y_std   y_std_rob
1 0.4395244 12.806853 -1.7617645 -1.4269699  0.003689598  0.00000000
2 1.7698225 -3.864509 -1.5746770 -1.2805106 -1.705238038 -1.39579772
3 4.5587083  1.926388 -1.1824599 -0.9734679 -1.111631801 -0.91095903
4 4.0705084 11.966959 -1.2511183 -1.0272163 -0.082405292 -0.07031957
5 5.1292877 -3.776704 -1.1022161 -0.9106499 -1.696237444 -1.38844632
6 7.7150650  3.014750 -0.7385634 -0.6259685 -1.000067292 -0.81983669
> 
> ## get uncorrected correlation
> cor(simdat$x, simdat$y)
[1] 0.7507123
> 
> ## get boot-strapped correlation that corrects for the 2 outliers
> library(WRS2)
> corrxy <- WRS2::pbcor(simdat$y, simdat$x, ci=TRUE, nboot=2000, beta=0.1)
> corrxy
Call:
WRS2::pbcor(x = simdat$y, y = simdat$x, beta = 0.1, ci = TRUE, 
    nboot = 2000)

Robust correlation coefficient: 0.7657
Test statistic: 5.7084
p-value: 1e-05 

Bootstrap CI: [0.5113; 0.9116]  # Boot-strapped CI

> ## set up bivariate Bayesian regression without intercept
> ## so we get the pure zero-order correlation
> library(brms)
Loading required package: Rcpp
Loading 'brms' package (version 2.13.5). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

Attaching package: ‘brms’

The following object is masked from ‘package:robustbase’:

    epilepsy

The following object is masked from ‘package:stats’:

    ar

> library(shinystan) 
> # gives a lovely visualization of the brms model fit object
Loading required package: shiny

This is shinystan version 2.5.0

> # in the formula below "y ~ 0 + x_std", 0 ensures there is no intercept
> mod1 <- brm( y_std ~ 0 + x_std, data=simdat, cores=2, chains=2)
Compiling Stan program...
Start sampling

SAMPLING FOR MODEL '9faff91dfca8b644fd3fe4e0f6965785' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)

SAMPLING FOR MODEL '9faff91dfca8b644fd3fe4e0f6965785' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 2.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.031892 seconds (Warm-up)
Chain 2:                0.025839 seconds (Sampling)
Chain 2:                0.057731 seconds (Total)
Chain 2: 
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.032274 seconds (Warm-up)
Chain 1:                0.028699 seconds (Sampling)
Chain 1:                0.060973 seconds (Total)
Chain 1: 
> summary(mod1)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y_std ~ 0 + x_std 
   Data: simdat (Number of observations: 25) 
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 2000

Population-Level Effects: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
x_std     0.76      0.14     0.48     1.05 1.00     1187     1030

# Boot-strap CI: 0.51 to 0.91 compared to  (corrects for outliers)
# Bayesian Credible Interval: 0.48 to 1.05 (does not correct for outliers)
# Since the Boot-strap CI is within the Bayesian Credible Interval
# I would use that.
# Raw Corr: 0.75 vs Bayesian Corr: 0.76 vs Bootstrap Corr: 0.77

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.69      0.11     0.52     0.95 1.00     1345     1132

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
> # extract posterior samples of population-level effects 

> samples1 <- posterior_samples(mod1, "^b") # this data frame has all the values of correlation
> head(samples1)
    b_x_std
1 0.9093316
2 0.7281373
3 0.7207291
4 0.6822180
5 0.9747108
6 0.9653564

> samples2 <- posterior_samples(mod1, "sigma") # this data frame has all the values of variance around correlation
> head(samples2)
      sigma
1 0.7320897
2 0.7212673
3 0.6204091
4 0.7844105
5 0.9443782
6 0.7311916

> launch_shinystan(mod1) # launches in your web browser

> write.csv(samples1,"/home/Documents/Projects/Rcode/rob_corr_brms.csv", row.names = FALSE) # to do more using Excel
> write.csv(samples2,"/home/Documents/Projects/Rcode/rob_corr_var_brms.csv", row.names = FALSE) # to do more using Excel  

> # To learn more about brms see this link below

http://paul-buerkner.github.io/brms/articles/index.html

Here is the second model run with the robust standardized x & y

> mod_rob <- brm( y_std_rob ~ 0 + x_std_rob, data=simdat, cores=2, chains=2) 
Compiling Stan program...
Start sampling

SAMPLING FOR MODEL '9faff91dfca8b644fd3fe4e0f6965785' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.4e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)

SAMPLING FOR MODEL '9faff91dfca8b644fd3fe4e0f6965785' NOW (CHAIN 2).
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: 
Chain 2: Gradient evaluation took 2.7e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.025874 seconds (Warm-up)
Chain 1:                0.028535 seconds (Sampling)
Chain 1:                0.054409 seconds (Total)
Chain 1: 
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.025316 seconds (Warm-up)
Chain 2:                0.026648 seconds (Sampling)
Chain 2:                0.051964 seconds (Total)
Chain 2: 
> summary(mod_rob)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y_std_rob ~ 0 + x_std_rob 
   Data: simdat (Number of observations: 25) 
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 2000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
x_std_rob     0.77      0.14     0.50     1.06 1.00     1639     1201

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.57      0.08     0.43     0.76 1.00     1314      977

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

> samples_rob <- posterior_samples(mod_rob, "^b")
> head(samples_rob)
  b_x_std_rob
1   0.8917219
2   0.6036900
3   0.9898435
4   0.6937937
5   0.7883487
6   0.8781157
> samples_rob_var <- posterior_samples(mod_rob, "sigma")
> head(samples_rob_var)
      sigma
1 0.5646454
2 0.4547035
3 0.6541133
4 0.4691680
5 0.6478816
6 0.4777489

为 R 中的稳健相关提取可信区间

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-28 03:05:33

为 R 中的稳健相关提取可信区间

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-28 03:05:33

解决方案1
1 已采纳 2020-10-28 03:05:33