[英]Fastest way to save/load data.table
What I would like to do is actually use the fastest available method to store data.table
s for further processing. 我想要做的是实际使用最快的方法来存储data.table
s以便进一步处理。
Something along the lines of: 有点像:
data.table
. 将其转换为data.table
。 data.table
, is that right? Is there some other binary option?) 将其保存为为重新读取而优化的格式(RDS似乎不适用于data.table
,是吗?还有其他二进制选项吗?) data.table
over and over again, doing slicing, grouping, plotting, ... 继续使用步骤#3中的文件,直接将其作为data.table
读取,进行切片,分组,绘图,... What is the best option for step #3? 步骤3的最佳选择是什么?
Ok, here some measurements on particular dataset I'm using. 好的,这里是我正在使用的特定数据集的一些测量。 It is originally in RDS, and reading it takes 60+ seconds. 它最初是在RDS中,读取它需要60多秒。
After that DT was saved as internal XDR as well as SQLite db, both uncompressed. 之后,DT被保存为内部XDR以及SQLite数据库,两者都是未压缩的。
save()/load() pair was fastest, 11.7-11.8 seconds load save()/ load()对最快,加载11.7-11.8秒
SQLite (dbReadTable) was pretty close, 12.0-12.1 seconds. SQLite(dbReadTable)非常接近,12.0-12.1秒。 File size with DB is about 30% smaller, so I could imagine the case where SQLite would be faster than save()/load(). 使用DB的文件大小减小了约30%,因此我可以想象SQLite比save()/ load()更快的情况。
For now save()/load() is for me, and it preserves class as well 现在save()/ load()适合我,它也保留了类
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.