简体   繁体   English

保存/加载data.table的最快方法

[英]Fastest way to save/load data.table

What I would like to do is actually use the fastest available method to store data.table s for further processing. 我想要做的是实际使用最快的方法来存储data.table s以便进一步处理。

Something along the lines of: 有点像:

  1. Read original data from CSV/RDS. 从CSV / RDS中读取原始数据。
  2. Convert it to a data.table . 将其转换为data.table
  3. Save it into a format optimized for re-reading (RDS doesn't seem to work with data.table , is that right? Is there some other binary option?) 将其保存为为重新读取而优化的格式(RDS似乎不适用于data.table ,是吗?还有其他二进制选项吗?)
  4. Continue to work over with file from step #3, reading it directly as a data.table over and over again, doing slicing, grouping, plotting, ... 继续使用步骤#3中的文件,直接将其作为data.table读取,进行切片,分组,绘图,...

What is the best option for step #3? 步骤3的最佳选择是什么?

Ok, here some measurements on particular dataset I'm using. 好的,这里是我正在使用的特定数据集的一些测量。 It is originally in RDS, and reading it takes 60+ seconds. 它最初是在RDS中,读取它需要60多秒。

After that DT was saved as internal XDR as well as SQLite db, both uncompressed. 之后,DT被保存为内部XDR以及SQLite数据库,两者都是未压缩的。

  1. save()/load() pair was fastest, 11.7-11.8 seconds load save()/ load()对最快,加载11.7-11.8秒

  2. SQLite (dbReadTable) was pretty close, 12.0-12.1 seconds. SQLite(dbReadTable)非常接近,12.0-12.1秒。 File size with DB is about 30% smaller, so I could imagine the case where SQLite would be faster than save()/load(). 使用DB的文件大小减小了约30%,因此我可以想象SQLite比save()/ load()更快的情况。

For now save()/load() is for me, and it preserves class as well 现在save()/ load()适合我,它也保留了类

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM