简体   繁体   中英

Timing issue related to using saveRDS in RStudio

Calling saveRDS before executing a loop results in inconsistent loop timing. This is only evident when using RStudio; the issue does not exist when running the same script from the command line using Rscript. This may be a result of a delayed I/O flush in RStudio.

  • Have others noticed this behavior?
  • Is there a way to force I/O flush after a saveRDS call?

Using RStudio 1.1.463, R 3.5.2, on Ubuntu 18.04LTS 64-bit.

I eliminated the garbage collector as the issue by calling gc() before the code execution and enabling gc messages with gcinfo to make sure gc is not being triggered. I also tried to pre-compile the function with cmpfun; this does not help either.

The following code can be used to reproduce the issue.

loop.test <- function() {
  t <- c()
  t0 <- Sys.time()
  for (i in 1:10) {
    t <- c(t, Sys.time() - t0)
    Sys.sleep(0.01)
  }

  dt <- round(1000 * diff(t), 1)
  print(dt)
  print(summary(dt))
}

saveRDS(1:10, 'garb.rds')

loop.test()

The code will produce the following output (loop times in ms):

[1]  10.1  10.1  10.2  10.2 275.4  10.2  10.1  10.2  10.2

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.10   10.10   10.20   39.63   10.20  275.40

Note that the large delay will not always appear at the same iteration.

Removing the saveRDS call in the code will always produce consistent (close to 10ms) loop timing.

Running the code from command line via Rscript will work with and without the saveRDS line.

Here is the response from the guys at RStudio:

One possibility: RStudio does some background work during R "idle" (ie: when R calls R_ProcessEvents), and I think project file indexing is one of those things. RStudio registers file monitors for these tasks, so the creation of a file might cause RStudio to do some re-indexing work behind the scenes.

I verified that changing the save directory in the saveRDS() call to be outside the project directory makes the timing issue go away. So I think this supports the file indexing theory. The RStudio v1.2 preview exhibits this behavior to a lesser extent, so something may have changed in the idle processing implementation, but I have no further information on that. RStudio folks opened a bug report to address this so hopefully the fix will be available soon.

So for now, some workarounds for this would be:

  1. Make a call to Sys.sleep() with sufficient delay after calling saveRDS() to allow the re-indexing operation to take place.
  2. Point saveRDS() to a file outside the current project directory to prevent re-indexing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM