简体   繁体   中英

How to delete temporary files in parallel task in R

Is it possible to delete temporary files from within a parallelized R task?

I rely on parallelization with doParallel and foreach in R to perform various calculations on small subsets of a huge raster file. This involves cropping a subset of the large raster many times. My basic syntax looks similar to this:

grid <- raster::raster("grid.tif")
data <- raster::raster("data.tif")

cl <- parallel::makeCluster(32)
doParallel::registerDoParallel(cl)

m <- foreach(col=ncol(grid)) %:% foreach(row=nrow(grid)) %dopar% {
   
   # get extent of subset 
   cell <- raster::cellFromRowCol(grid, row, col)
   ext <- raster::extentFromCells(grid, cell)
   
   # crop main raster to subset extent
   subset <- raster::crop(data, ext)
   
   # ...
   # perform some processing steps on the raster subset
   # ...
   
   # save results to a separate file
   saveRDS(subset, paste0("output_folder/", row, "_", col)
}

The algorithm works perfectly fine and achieves what I want it to. However, raster::crop(data, ext) creates a small temporary file everytime it is called. This seems to be standard behavior of the raster package, but it becomes a problem, because these temp files are only deleted after the whole code has been executed, and take up way too much disk space in the meantime (hundreds of GB).

In a serial execution of the task I can simply delete the temporary file with file.remove(subset@file@name) . However, this does not work anymore when running the task in parallel. Instead, the command is simply ignored and the temp file stays where it is until the whole task is done.

Any ideas as to why this is the case and how I could solve this problem?

There is a function for this removeTmpFiles .

You should be able to use f <- filename(subset) , avoid reading from slots ( @ ). I do not see why you would not be able to remove it. But perhaps it needs some fiddling with the path?

temp files are only created when the raster package deems it necessary, based on RAM available and required. See canProcessInMemory(, verbose=TRUE) . The default settings are somewhat conservative, and you can change them with rasterOptions() (memfrac and maxmemory)

Another approach is to provide a filename argument to crop. Then you know what the filename is, and you can delete it. Of course you need to take care of not overwriting data from different tasks, so you may need to use some unique id associated with it.

saveRDS( ) won't work if the raster is backed up by a tempfile (as it will disappear).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM