简体   繁体   English

在大型数据集上运行 R 脚本时如何防止我的计算机崩溃

[英]How can I prevent my computer from crashing when running R-script on large dataset

  • Goal : Read, compress and write images from one location to another.目标:从一个位置读取、压缩和写入图像到另一个位置。 The image dataset is about 5 TB in size.图像数据集大小约为 5 TB。 The average size of the individual images is about 2-5 Mb.单个图像的平均大小约为 2-5 Mb。
  • Problem : When I run it for the whole dataset, my Mac crashes after about 1 GB.问题:当我为整个数据集运行它时,我的 Mac 在大约 1 GB 后崩溃。 The script works for a subset of about 400 images.该脚本适用于大约 400 张图像的子集。

By reading in the images one by one, I had hoped it would not require too much memory and processing power, but I probably missed something.通过一张一张地阅读图像,我希望它不需要太多的内存和处理能力,但我可能错过了一些东西。 Could some review my code below and provide insight in why it crashes?有人可以查看我下面的代码并提供有关它崩溃的原因的见解吗? Any tips and suggestions would be very much appreciated.任何提示和建议将不胜感激。 Apologies for not giving a reproducible example.抱歉没有给出可重复的例子。

rm(list=ls())

## 1. LOAD PACKAGES
library(magick)
library(purrr)
library(furrr)

## 2. SET MAIN FOLDER
Directory_Folder <- "C:/Users/Nick/Downloads/" 
Folder_Name <- "Photos for Nick"

## 3. SET NEW LOCATION
New_Directory <- "C:/Users/Daikoro/Desktop/"     ## MAKE SURE TO INCLUDE THE FINAL FORWARD SLASH

## 4. LIST ALL FILES
list.of.files <- list.files(path = paste0(Directory_Folder, Folder_Name), full.names = TRUE, recursive = TRUE)

## 5. FUNCTION FOR READING, RESIZING, AND WRITING IMAGES
MyFun <- function(i) {
  
  new.file.name <- gsub(Directory_Folder, New_Directory, i)
  
  magick::image_read(i) %>%  ## IMPORT PHOTOS INTO R
            image_scale("400") %>%  ## IMAGE RE-SCALING
            image_write(path = new.file.name)
}

## 6. SET UP MULTI-CORES
future::plan(multiprocess)

## 7. RUN FUNCTION ON ALL FILES
future_map(list.of.files, MyFun)   ## THIS WILL TAKE A WHILE...AND CRASHES AT 1GB

With the feedback from Ben Bolker, r2evans, and Waldi I managed to get the script going.根据 Ben Bolker、r2evans 和 Waldi 的反馈,我设法使脚本运行起来。 I added gc() in the last line of MyFun .我在MyFun的最后一行添加了gc() And also specified a number of cores like this:并且还指定了许多这样的内核:

## SET UP MULTI-CORES
no_cores <- availableCores() - 1
future::plan(multisession, workers = no_cores)

While this made the script much slower, at least it didn't crash.虽然这使脚本变慢了很多,但至少它没有崩溃。 I'm not sure if that's because I more processing cores were available, or because of the gc() line.我不确定这是因为我有更多的处理核心可用,还是因为gc()线。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM