簡體   English   中英

r markdown 如果並行運行會產生重復

[英]r markdown generates duplicates if run in parallel

我正在通過 r markdown 生成幾份報告。 如果我一個一個地做——一切都好。 如果我使用 %do% - 也可以。 如果我使用 %dopar% - 3 個選項:

  1. 有時沒關系。
  2. 有時報告名稱不同但內容相同。
  3. 有時 pandoc 失敗並出現錯誤:pandoc 文檔轉換失敗並出現錯誤 1

如何解決?

以下代碼在 100% 的情況下都能正常工作:

library(tidyverse)
library(parallel)
library(doParallel)



OutputFolder <- "c:\\temp\\test\\out"
result_foldername <- "Now"

ServersInDB <<- c("server1.ru", "server2.ru")

cores=detectCores(logical = FALSE)

cl <- parallel::makeCluster(cores-1) #not to overload your computer

registerDoParallel(cl)

render_all_obj <- function  (MachineName, OutputFolder, result_foldername)
{
  
  library(rmarkdown)
  render(input = "c:\\temp\\test\\proj\\Report.RMD",
         output_file = paste0(MachineName, ".html"),
         output_dir = file.path (OutputFolder, result_foldername  ),
         params = list(MachineName = MachineName)
  )
  
}

foreach (MachineName = ServersInDB) %do% {
  
  render_all_obj(MachineName, OutputFolder, result_foldername)
}

parallel::stopCluster(cl)

這是失敗的代碼。

library(tidyverse)
library(parallel)
library(doParallel)



OutputFolder <- "c:\\temp\\test\\out"
result_foldername <- "Now"

ServersInDB <<- c("server1.ru", "server2.ru")

cores=detectCores(logical = FALSE)

cl <- parallel::makeCluster(cores[1]-1) #not to overload your computer

registerDoParallel(cl)

render_all_obj <- function  (MachineName, OutputFolder, result_foldername)
{
  
  library(rmarkdown)
  render(input = "c:\\temp\\test\\proj\\Report.RMD",
         output_file = paste0(MachineName, ".html"),
         output_dir = file.path (OutputFolder, result_foldername  ),
         params = list(MachineName = MachineName)
  )
  
}

foreach (MachineName = ServersInDB) %dopar% {
  
  render_all_obj(MachineName, OutputFolder, result_foldername)
}

parallel::stopCluster(cl)

這是我的rmd:


---
output:
  html_document:
    toc: true
    dev: 'svg'
    number_sections: true
    toc_depth: 2
    toc_float: true
    theme: cerulean
    toc_collapsed: true
    self_contained: true
    mathjax: NULL

params: 
  MachineName: "ServerName" #name of server to analyze

---



```{r , echo=FALSE, include=FALSE, results='hide'}

MachineName <- params$MachineName

```



---
title: "My report is about: `r MachineName`"

---

問題是 - 名為Report.knit.md的文件。 默認情況下,它創建在使用rmarkdown::render function 的參數輸入指定的目錄中。 這是所有並行進程的同一目錄。 所有進程都試圖對同一個文件執行創建、讀取、寫入操作。

解決方法是為每個進程使用intermediates_dir參數和唯一的臨時目錄。

工作解決方案:

registerDoFuture()

workers <- parallel::detectCores(logical = FALSE) - 1
future::plan(multisession, workers = workers)


ServersInDB <- c("server1.ru", "server2.ru")

render_all_obj <- function  (MachineName)
{
  
  OutputFolder <- "c:/temp/test/out"
  result_foldername <- "Now"
  
  library(rmarkdown)
  
  tf <- tempfile()
  dir.create(tf)
  
  render(input = "c:/temp/test/proj/Report.RMD",
         output_file = paste0(MachineName, ".html"),
         intermediates_dir=tf,
         output_dir = file.path (OutputFolder, result_foldername),
         params = list(MachineName = MachineName)
  )
  
  unlink(tf)
  
}


ServersInDB %>% furrr::future_map(render_all_obj)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM