简体   繁体   中英

R language, how to use bootstraps to generate maximum likelihood and AICc?

Sorry for a quite stupid question. I am doing multiple comparisons of morphologic traits through correlations of bootstraped data. I'm curious if such multiple comparisons are impacting my level of inference, as well as the effect of the potential multicollinearity in my data. Perhaps, a reasonable option would be to use my bootstraps to generate maximum likelihood and then generate AICc-s to do comparisons with all of my parameters, to see what comes out as most important... the problem is that although I have (more or less clear) the way, I don't know how to implement this in R. Can anybody be so kind as to throw some light on this for me? So far, here an example (using R language, but not my data):

   library(boot)
   data(iris)
   head(iris)
   # The function
   pearson <- function(data, indices){
      dt<-data[indices,]
      c(
      cor(dt[,1], dt[,2], method='p'),
      median(dt[,1]),
      median(dt[,2])
      )
   }
   # One example: iris$Sepal.Length ~ iris$Sepal.Width
   # I calculate the r-squared with 1000 replications
   set.seed(12345)
   dat <- iris[,c(1,2)]
   dat <- na.omit(dat)
   results <- boot(dat, statistic=pearson, R=1000)
   # 95% CIs
   boot.ci(results, type="bca")
   BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
   Based on 1000 bootstrap replicates

   CALL : 
   boot.ci(boot.out = results, type = "bca")

   Intervals : 
   Level       BCa          
   95%   (-0.2490,  0.0423 )  
   Calculations and Intervals on Original Scale

   plot(results)

在此处输入图像描述

I have several more pairs of comparisons.

More of a Cross Validated question.

Multicollinearity shouldn't be a problem if you're just assessing the relationship between two variables (in your case correlation). Multicollinearity only becomes an issue when you fit a model, eg multiple regression, with several highly correlated predictors.

Multiple comparisons is always a problem though because it increases your type-I error. The way to address that is to do a multiple comparison correction, eg Bonferroni-Holm or the less conservative FDR. That can have its downsides though, especially if you have a lot of predictors and few observations - it may lower your power so much that you won't be able to find any effect, no matter how big it is.

In high-dimensional setting like this, your best bet may be with some sort of regularized regression method. With regularization, you put all predictors into your model at once, similarly to doing multiple regression, however, the trick is that you constrain the model so that all of the regression slopes are pulled towards zero, so that only the ones with the big effects "survive". The machine learning versions of regularized regression are called ridge, LASSO, and elastic net, and they can be fitted using theglmnet package. There is also Bayesian equivalents in so-called shrinkage priors, such as horseshoe (see eg https://avehtari.github.io/modelselection/regularizedhorseshoe_slides.pdf ). You can fit Bayesian regularized regression using the brms package.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM