简体   繁体   English

以更快的方式读取 R 中的大型 RDS 文件

[英]Reading large RDS files in R in a faster way

I have a large RDS file to read in R. However, it takes quite some time to read the file.我在R有一个很大的RDS文件需要读取。但是,读取文件需要相当长的时间。

Is there a way to speed up the reading?有没有办法加快阅读速度? I tried data.table library with its fread function, but I get an error.我尝试data.table库及其fread function,但出现错误。

data <- readRDS("myData.rds")

data <- fread("myData.rds")  # error

One way to fasten the read operations of large files is to read it in a compressed mode 固定大文件的读取操作的一种方法是以压缩模式读取它

system.time(read.table("bigdata.txt", sep=","))

user: 170.901
system: 1.996
elapsed: 192.137

Now trying the same reading but with a compressed file 现在尝试相同的读数,但使用压缩文件

system.time(read.table("bigdata-compressed.txt.gz", sep=","))

user: 65.511
system: 0.937
elapsed: 66.198

Compression can also influence the speed of reading for rds files:压缩也会影响 rds 文件的读取速度:

n<-1000
m<-matrix(runif(n^2), ncol=n)
default<-tempfile()
unComp<-tempfile()
saveRDS(m,default)
saveRDS(m, unComp,compress = F)
microbenchmark::microbenchmark(readRDS(default), readRDS(unComp))
#> Unit: milliseconds
#>              expr      min       lq     mean   median       uq      max neval
#>  readRDS(default) 46.37050 49.54836 56.03324 56.19446 59.99967 96.16305   100
#>   readRDS(unComp) 11.60771 13.16521 15.54902 14.01063 17.36194 27.35329   100
#>  cld
#>    b
#>   a
file.info(default)$size
#> [1] 5326357
file.info(unComp)$size
#> [1] 8000070
require(qs)
#> Loading required package: qs
#> qs v0.25.1.
qs<-tempfile()
qsave(m, qs)
microbenchmark::microbenchmark(qread(qs), readRDS(unComp))
#> Unit: milliseconds
#>             expr       min       lq     mean   median       uq      max neval
#>        qread(qs) 10.164793 12.26211 15.31887 14.71873 17.25536 27.08779   100
#>  readRDS(unComp)  9.342042 12.59317 15.63974 14.44625 17.93492 35.12563   100
#>  cld
#>    a
#>    a
file.info(qs)$size
#> [1] 4187017

However as seen here it comes at the cost of file size.然而,正如此处所见,它是以文件大小为代价的。 It might also be that the speed of storage has an influence.也可能是存储速度有影响。 On slow storage (eg.network, spinning disks) it might actually be better to use compression as the file is quicker read from disk.在慢速存储(例如网络、旋转磁盘)上,使用压缩实际上可能更好,因为文件可以更快地从磁盘读取。 It is thus work experimenting.因此,这是工作试验。 Specific packages might even provide slightly better performance here qs has the same speed but a smaller size combining the good of both worlds.特定的包甚至可能提供稍微更好的性能,这里qs具有相同的速度但更小的尺寸结合了两个世界的优点。 For specific data formats other packages might work better see this overview: https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets对于特定的数据格式,其他包可能会更好地工作,请参阅此概述: https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM