[英]r markdown generates duplicates if run in parallel
我正在通過 r markdown 生成幾份報告。 如果我一個一個地做——一切都好。 如果我使用 %do% - 也可以。 如果我使用 %dopar% - 3 個選項:
如何解決?
以下代碼在 100% 的情況下都能正常工作:
library(tidyverse)
library(parallel)
library(doParallel)
OutputFolder <- "c:\\temp\\test\\out"
result_foldername <- "Now"
ServersInDB <<- c("server1.ru", "server2.ru")
cores=detectCores(logical = FALSE)
cl <- parallel::makeCluster(cores-1) #not to overload your computer
registerDoParallel(cl)
render_all_obj <- function (MachineName, OutputFolder, result_foldername)
{
library(rmarkdown)
render(input = "c:\\temp\\test\\proj\\Report.RMD",
output_file = paste0(MachineName, ".html"),
output_dir = file.path (OutputFolder, result_foldername ),
params = list(MachineName = MachineName)
)
}
foreach (MachineName = ServersInDB) %do% {
render_all_obj(MachineName, OutputFolder, result_foldername)
}
parallel::stopCluster(cl)
這是失敗的代碼。
library(tidyverse)
library(parallel)
library(doParallel)
OutputFolder <- "c:\\temp\\test\\out"
result_foldername <- "Now"
ServersInDB <<- c("server1.ru", "server2.ru")
cores=detectCores(logical = FALSE)
cl <- parallel::makeCluster(cores[1]-1) #not to overload your computer
registerDoParallel(cl)
render_all_obj <- function (MachineName, OutputFolder, result_foldername)
{
library(rmarkdown)
render(input = "c:\\temp\\test\\proj\\Report.RMD",
output_file = paste0(MachineName, ".html"),
output_dir = file.path (OutputFolder, result_foldername ),
params = list(MachineName = MachineName)
)
}
foreach (MachineName = ServersInDB) %dopar% {
render_all_obj(MachineName, OutputFolder, result_foldername)
}
parallel::stopCluster(cl)
這是我的rmd:
---
output:
html_document:
toc: true
dev: 'svg'
number_sections: true
toc_depth: 2
toc_float: true
theme: cerulean
toc_collapsed: true
self_contained: true
mathjax: NULL
params:
MachineName: "ServerName" #name of server to analyze
---
```{r , echo=FALSE, include=FALSE, results='hide'}
MachineName <- params$MachineName
```
---
title: "My report is about: `r MachineName`"
---
問題是 - 名為Report.knit.md的文件。 默認情況下,它創建在使用rmarkdown::render function 的參數輸入指定的目錄中。 這是所有並行進程的同一目錄。 所有進程都試圖對同一個文件執行創建、讀取、寫入操作。
解決方法是為每個進程使用intermediates_dir參數和唯一的臨時目錄。
工作解決方案:
registerDoFuture()
workers <- parallel::detectCores(logical = FALSE) - 1
future::plan(multisession, workers = workers)
ServersInDB <- c("server1.ru", "server2.ru")
render_all_obj <- function (MachineName)
{
OutputFolder <- "c:/temp/test/out"
result_foldername <- "Now"
library(rmarkdown)
tf <- tempfile()
dir.create(tf)
render(input = "c:/temp/test/proj/Report.RMD",
output_file = paste0(MachineName, ".html"),
intermediates_dir=tf,
output_dir = file.path (OutputFolder, result_foldername),
params = list(MachineName = MachineName)
)
unlink(tf)
}
ServersInDB %>% furrr::future_map(render_all_obj)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.