简体   繁体   English

如何通过并行化加速 tidymodels 引导

[英]How to speed up the tidymodels bootstrapping with parallelization

I have the following code, that performs bootstrapping and calculates the confidence interval.我有以下代码,它执行自举并计算置信区间。

library(resample)
ibrary(broom)
library(dplyr)
library(purrr)
library(tibble)

lm_est <- function(split, ...) {
  lm(mpg ~ disp + hp, data = analysis(split)) %>%
    tidy()
}

set.seed(52156)
car_rs <-
  bootstraps(mtcars, 500, apparent = TRUE) %>%
  mutate(results = map(splits, lm_est))

int_pctl(car_rs, results) # this is important 

It produces它产生

> int_pctl(car_rs, results)
# A tibble: 3 × 6
  term         .lower .estimate   .upper .alpha .method   
  <chr>         <dbl>     <dbl>    <dbl>  <dbl> <chr>     
1 (Intercept) 27.7      31.0    34.1       0.05 percentile
2 disp        -0.0431   -0.0295 -0.0123    0.05 percentile
3 hp          -0.0643   -0.0281 -0.00930   0.05 percentile

But it runs very slowly.但它运行非常缓慢。 How can I speed it up with parallelization?我怎样才能通过并行化来加速它? Note that the output of the parallelization needs to be able to be processed by int_pctl() .注意并行化的 output 需要能够被int_pctl()处理。

I tried this but failed:我试过这个但失败了:

library(parallel)
# set the number of cores to use for parallelization
cores <- detectCores() - 1
cl <- makeCluster(cores)

# use mcmapply to parallelize the bootstrapping process
car_rs$results <- mcmapply(lm_est, car_rs$splits, mc.cores = cores, mc.preschedule = TRUE)

stopCluster(cl)

There are parallel versions of purrr::map*() functions in the furrr package that you can use.您可以使用furrr package 中的purrr::map*()函数的并行版本。

library(rsample)
library(broom)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(tibble)
library(furrr)  #<- added
#> Loading required package: future

plan(multisession, workers = parallel::detectCores())  #<- added


lm_est <- function(split, ...) {
  library(broom) #<- added to load inside of remote workers
  lm(mpg ~ disp + hp, data = analysis(split)) %>%
    tidy()
}

set.seed(52156)
car_rs <-
  bootstraps(mtcars, 1500, apparent = TRUE) %>%
  mutate(results = future_map(splits, lm_est))   #<- changed

int_pctl(car_rs, results) # this is important 
#> # A tibble: 3 × 6
#>   term         .lower .estimate   .upper .alpha .method   
#>   <chr>         <dbl>     <dbl>    <dbl>  <dbl> <chr>     
#> 1 (Intercept) 27.7      30.8    33.6       0.05 percentile
#> 2 disp        -0.0443   -0.0298 -0.0146    0.05 percentile
#> 3 hp          -0.0584   -0.0267 -0.00718   0.05 percentile

Created on 2023-01-26 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2023-01-26

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM