简体   繁体   English

为 R 中的稳健相关提取可信区间

[英]Extract credible intervals for robust correlations in R

I currently know how to use pbcor from the WRS2 package to extract robust correlations.我现在知道如何使用pbcorWRS2包解压到确凿的相关性。 This function calculates the 95% bootstrap confidence intervals around the estimated robust correlation.此函数计算估计的稳健相关性周围的 95% 自举置信区间。 For consistency with the rest of my analyses and manuscript, I need to extract credible intervals instead of confidence intervals.为了与我其余的分析和手稿保持一致,我需要提取可信区间而不是置信区间。

How can I extract the 95% credible intervals , instead of the 95% confidence intervals?如何提取 95%可信区间而不是95% 置信区间? Is there a way to do this using pbcor ?有没有办法使用pbcor做到这pbcor

My dataset contains 210 observations, but here is a subset of the data:我的数据集包含 210 个观察值,但这是数据的一个子集:

Individual  varA    varB
1   2.9380842   0.09896456
2   2.9380842   -1.38772037
3   -0.6879859  -2.41310243
4   -0.6879859  0.55722346
5   -2.3129564  -1.34140699
6   -2.3129564  -1.75604301
7   -0.4937431  0.78381085
8   -0.4937431  0.38320385
9   -0.8558126  0.82125672
10  -0.8558126  0.06346062
11  -0.9211026  -1.67170174

Corresponding code:对应代码:

WRS2::pbcor(data$varA, data$varB, ci=TRUE, nboot=1000, beta=0.1) 
>robust correlation coefficient: 0.275
>test statistic: 0.8582
>p-value:0.41307
>bootstrap CI: [-0.3564; 0.7792]

Hi @Blundering Ecologist嗨@Blundering生态学家

Here is a complete example of estimating Credible Intervals using Bayesian Modeling to compare against the WRS2 based Robust Confidence Intervals: If you use the set.seed you should be able to recreate the data.Your results will be different when you go to the Bayesian part,as it should.这是使用贝叶斯建模来估计可信区间以与基于 WRS2 的稳健置信区间进行比较的完整示例:如果您使用 set.seed,您应该能够重新创建数据。当您转到贝叶斯部分时,结果会有所不同,正如它应该。 My comments are included in the code below.我的评论包含在下面的代码中。

> ## generate data
> set.seed(123)     # for reproducibility
> x <- seq(1:25)+rnorm(25)     
> y <- seq(26:50)-7*rnorm(25)  
> y[10] <- y[10] * 2.5  # introduce outlier in 10th record
> y[20] <- y[20] * 1.5 # introduce outlier in 20th record
> 
> simdat <- cbind(data.frame(x), data.frame(y)) # create data frame
> 
> 
> ## standardize data
> library(robustHD)      # very useful functions standardize() & robStandardize()
Loading required package: ggplot2
Loading required package: perry
Loading required package: parallel
Loading required package: robustbase
> simdat$x_std <- standardize(simdat$x)     # mean and sd
> simdat$x_std_rob <- robStandardize(simdat$x)  # median and MAD
> 
> ## repeat for y
> simdat$y_std <- standardize(simdat$y)     # uses mean and sd
> simdat$y_std_rob <- robStandardize(simdat$y)  # uses median and MAD
> 
> head(simdat) # to see variable names of the standardized data
          x         y      x_std  x_std_rob        y_std   y_std_rob
1 0.4395244 12.806853 -1.7617645 -1.4269699  0.003689598  0.00000000
2 1.7698225 -3.864509 -1.5746770 -1.2805106 -1.705238038 -1.39579772
3 4.5587083  1.926388 -1.1824599 -0.9734679 -1.111631801 -0.91095903
4 4.0705084 11.966959 -1.2511183 -1.0272163 -0.082405292 -0.07031957
5 5.1292877 -3.776704 -1.1022161 -0.9106499 -1.696237444 -1.38844632
6 7.7150650  3.014750 -0.7385634 -0.6259685 -1.000067292 -0.81983669
> 
> ## get uncorrected correlation
> cor(simdat$x, simdat$y)
[1] 0.7507123
> 
> ## get boot-strapped correlation that corrects for the 2 outliers
> library(WRS2)
> corrxy <- WRS2::pbcor(simdat$y, simdat$x, ci=TRUE, nboot=2000, beta=0.1)
> corrxy
Call:
WRS2::pbcor(x = simdat$y, y = simdat$x, beta = 0.1, ci = TRUE, 
    nboot = 2000)

Robust correlation coefficient: 0.7657
Test statistic: 5.7084
p-value: 1e-05 

Bootstrap CI: [0.5113; 0.9116]  # Boot-strapped CI

> ## set up bivariate Bayesian regression without intercept
> ## so we get the pure zero-order correlation
> library(brms)
Loading required package: Rcpp
Loading 'brms' package (version 2.13.5). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

Attaching package: ‘brms’

The following object is masked from ‘package:robustbase’:

    epilepsy

The following object is masked from ‘package:stats’:

    ar

> library(shinystan) 
> # gives a lovely visualization of the brms model fit object
Loading required package: shiny

This is shinystan version 2.5.0

> # in the formula below "y ~ 0 + x_std", 0 ensures there is no intercept
> mod1 <- brm( y_std ~ 0 + x_std, data=simdat, cores=2, chains=2)
Compiling Stan program...
Start sampling

SAMPLING FOR MODEL '9faff91dfca8b644fd3fe4e0f6965785' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)

SAMPLING FOR MODEL '9faff91dfca8b644fd3fe4e0f6965785' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 2.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.031892 seconds (Warm-up)
Chain 2:                0.025839 seconds (Sampling)
Chain 2:                0.057731 seconds (Total)
Chain 2: 
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.032274 seconds (Warm-up)
Chain 1:                0.028699 seconds (Sampling)
Chain 1:                0.060973 seconds (Total)
Chain 1: 
> summary(mod1)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y_std ~ 0 + x_std 
   Data: simdat (Number of observations: 25) 
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 2000

Population-Level Effects: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
x_std     0.76      0.14     0.48     1.05 1.00     1187     1030

# Boot-strap CI: 0.51 to 0.91 compared to  (corrects for outliers)
# Bayesian Credible Interval: 0.48 to 1.05 (does not correct for outliers)
# Since the Boot-strap CI is within the Bayesian Credible Interval
# I would use that.
# Raw Corr: 0.75 vs Bayesian Corr: 0.76 vs Bootstrap Corr: 0.77

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.69      0.11     0.52     0.95 1.00     1345     1132

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
> # extract posterior samples of population-level effects 

> samples1 <- posterior_samples(mod1, "^b") # this data frame has all the values of correlation
> head(samples1)
    b_x_std
1 0.9093316
2 0.7281373
3 0.7207291
4 0.6822180
5 0.9747108
6 0.9653564

> samples2 <- posterior_samples(mod1, "sigma") # this data frame has all the values of variance around correlation
> head(samples2)
      sigma
1 0.7320897
2 0.7212673
3 0.6204091
4 0.7844105
5 0.9443782
6 0.7311916

> launch_shinystan(mod1) # launches in your web browser

> write.csv(samples1,"/home/Documents/Projects/Rcode/rob_corr_brms.csv", row.names = FALSE) # to do more using Excel
> write.csv(samples2,"/home/Documents/Projects/Rcode/rob_corr_var_brms.csv", row.names = FALSE) # to do more using Excel  

> # To learn more about brms see this link below

http://paul-buerkner.github.io/brms/articles/index.html

Here is the second model run with the robust standardized x & y

> mod_rob <- brm( y_std_rob ~ 0 + x_std_rob, data=simdat, cores=2, chains=2) 
Compiling Stan program...
Start sampling

SAMPLING FOR MODEL '9faff91dfca8b644fd3fe4e0f6965785' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.4e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)

SAMPLING FOR MODEL '9faff91dfca8b644fd3fe4e0f6965785' NOW (CHAIN 2).
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: 
Chain 2: Gradient evaluation took 2.7e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.025874 seconds (Warm-up)
Chain 1:                0.028535 seconds (Sampling)
Chain 1:                0.054409 seconds (Total)
Chain 1: 
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.025316 seconds (Warm-up)
Chain 2:                0.026648 seconds (Sampling)
Chain 2:                0.051964 seconds (Total)
Chain 2: 
> summary(mod_rob)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y_std_rob ~ 0 + x_std_rob 
   Data: simdat (Number of observations: 25) 
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 2000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
x_std_rob     0.77      0.14     0.50     1.06 1.00     1639     1201

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.57      0.08     0.43     0.76 1.00     1314      977

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

> samples_rob <- posterior_samples(mod_rob, "^b")
> head(samples_rob)
  b_x_std_rob
1   0.8917219
2   0.6036900
3   0.9898435
4   0.6937937
5   0.7883487
6   0.8781157
> samples_rob_var <- posterior_samples(mod_rob, "sigma")
> head(samples_rob_var)
      sigma
1 0.5646454
2 0.4547035
3 0.6541133
4 0.4691680
5 0.6478816
6 0.4777489

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为 R 中的 lme4 模型提取随机效应的后验估计和可信区间 - Extract posterior estimate and credible intervals for random effect for lme4 model in R 为每一行添加可信区间 - Add Credible Intervals to each line pbcor 和 ggcorrmat 相关性在 R 中给出不同的置信区间 - pbcor and ggcorrmat correlations give different confidence intervals in R 如何在 r 中合并稳健混合 model 的 MI 置信区间? - how to pool MI confidence intervals of robust mixed model in r? R 中的 Function 计算线性回归的异方差稳健置信区间 - Function in R that computes heteroskedasticity-robust confidence intervals for a linear regression 提取 R 中的稳健标准误差后如何获得置信区间? - How to get confidence intervals after extracting robust standard errors in R? 如何找到和plot不同的可信区间? - How to find and plot different credible intervals? 如何仅从 R 中的特定列对中提取显着相关性? - How to extract only significant correlations from specific column pairs in R? 使用Rstanarm计算二项式Logit中的边际效应的可信区间 - Calculating credible intervals for marginal effects in binomial logit using rstanarm 从 glmmTMB output 中提取后验模式和可信区间 - Extracting posterior modes and credible intervals from glmmTMB output
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM