简体   繁体   中英

Combine a list of data.tables

Is there a specific method for combining a list of data.tables in R?

I have a list of ~20 data.tables, each with around 1 million rows, and would like to combine them into one data.table with 20 million rows.

I've been doing it with

Reduce('rbind', data.table)

but it takes a while.

Tnx!

Using do.call appears to be about 10x faster with this made up example:

library(data.table)

x1 <- data.table(x = runif(1e6), y = runif(1e6))
x2 <- data.table(x = runif(1e6), y = runif(1e6))

#20 data.tables all of length 1e6
yourList <- list(x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2)

system.time(out1 <- Reduce("rbind", yourList))
#-----
   user  system elapsed 
   3.37    3.03    6.43 
system.time(out2 <- do.call("rbind", yourList))
#-----
   user  system elapsed 
   0.33    0.36    0.68 
all.equal(out1,out2)
#-----
[1] TRUE

Edit - to incorporate Matt's answer

I did not realize data.table had a specific function for this task. Par for the course, it is quite fast. Here is the relevant timing:

system.time(out3 <- rbindlist(yourList))
#-----
   user  system elapsed 
   0.07    0.03    0.11 

all.equal(out1,out3)
#-----
[1] TRUE

See ?rbindlist and these related questions (easier to find when you know what to search for!) :

data.table questions and answers containing rbindlist

For my money, the plyr package's ldply is the by way to do this. I has the advantage that the name of the list element is added as a new first column, named .id .

In addition, a list of data frames is often the output of tapply , in which case replace the whole shebang with ddply .

Alternatives include do.call("rbind", mylist) or lattice's make.groups (haven't been able to find this one recently though).


Note: I may have misunderstood the question-I read data.frame instead of data.table . These techniques still work, but I'm not sure they result in a data.table all of the time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM