如何在R中並行讀寫csv文件

Question

我有幾百萬個小（每個幾 MB）csv 文件，我需要讀入、修改然后寫出。 如何在 R 中並行執行此操作？ 我嘗試了以下方法，但沒有任何反應。 我不需要組合任何東西——我只想一次處理多個文件。 我希望這是可能的，否則，我將長期存在。 也許這甚至是不可能的 - 是否可以一次從硬盤驅動器讀取和寫入多個文件？ 我正在運行 Windows 10 操作系統。

registerDoParallel(cores = 2)
foreach (x in 1:100) %:%
    foreach (y in 1:100) %dopar% {
       readr::read_csv(paste0('fake_file',x,'_',y,'csv'))

       # work on files
       readr::write_csv(paste0('fake_file',x,'_',y,'csv'))
}

Answer 1

只需使用 vroom

library(vroom)

vroom("your_path")

Answer 2

這是另一個在幕后使用未來 package 的解決方案

library(tidyverse)

dir.create("example_data")

for (i in 1:10) {
  df <- as.data.frame(matrix(sample(1000),ncol = 10))
  write_csv(df,file = str_c("example_data/",i,"test.csv"))
  
}


# Create a function that does what you want for one file (try to test it)

file_function <- function(file_name){
  df_read <- read_csv(file_name, col_types = cols())
  
  df_processed <- df_read %>% 
    mutate(across(everything(),.fns = ~ .x * 10))
  
  df_processed %>%
    write_csv(file_name)
  
  str_c(file_name," processed")
  
}

# get file names

file_names <- list.files(path = "example_data/",full.names = T)

# furrr functional solution


library(furrr)

# plan execution

# multisession if windows

# multicore if linux or mac

set.seed(123)
plan(multisession,workers = availableCores())


file_names %>% future_map(file_function,.options = furrr_options(seed = TRUE))

# realease memory

plan(sequential)

# realease hd
unlink("example_data",recursive = T)

如何在R中並行讀寫csv文件

問題描述

2 個解決方案

解決方案1
0 2021-01-31 16:24:00

解決方案2
0 2021-01-31 16:56:39

如何在R中並行讀寫csv文件

問題描述

2 個解決方案

解決方案1 0 2021-01-31 16:24:00

解決方案2 0 2021-01-31 16:56:39

解決方案1
0 2021-01-31 16:24:00

解決方案2
0 2021-01-31 16:56:39