简体   繁体   中英

R data.table. interface to on-disk fst files: fst_table

I want to use for a large dataset the fst_table function from the package "fstpackage" found here: https://github.com/fstpackage/fsttable .

devtools::install_github("fstpackage/fsttable")
library(fsttable)
nr_of_rows <- 1e6
x <- data.table::data.table(X = 1:nr_of_rows, Y = LETTERS[1 + (1:nr_of_rows) %% 26])
fst::write_fst(x, "1.fst")
ft <- fst_table("1.fst")

I can extract rows and columns of the created file, however, is it possible to do operations like:

ft[X == 1,]

as in a standard data.table? or can I create a key of this data.table for fast serialization? My goal with this is to extract data using values of the columns without loading all the dataset into the memory.

Unfortunately, fsttable only works to load the dataset and select columns/rows. Although in the documentation of the package says:

This fst_table can be used as a regular data.table object

The reality is that regular data.table operations such as the one you mentioned can not be performed (at least with version 0.1.3 ). The main reason behind it is that we are in fact not working with a data.table object, but rather with a data.table interface:

> class(ft)
[1] "datatableinterface" "data.table"         "data.frame" 

However, the data from the fsttable object can be "pulled" as a vector and then be filtered. Following your example:

ft[,list(X)]$X
ft[,list(X)][['X']]
ft[,list(X)] %>% pull()

And then filtered, for example:

> ft[,list(X)]$X[ft[,list(X)]$X==1]
[1] 1

I presume there should be an easy way to convert a fsttable object to a genuine data.table by pulling each variable and then binding all them together.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM