简体   繁体   中英

How to combine the columns of two data.tables?

I have two tables of the type data.table. I want to combine the tables into one table. They look like:

DT1:

1  A  B  C
2  A  B  C  
3  A  B  C

DT2:

D  E  F
D  E  F
D  E  F

I want to combine them like:

1  A  B  C  D  E  F
2  A  B  C  D  E  F
3  A  B  C  D  E  F

I don't know how I can do this by merging, because the data tables don't have common column names.

I know this is possible with cbind, but I'm working with a lot of rows, so I prefer a function which is built to work with big data tables.

Could anyone tell me how to do this?

I've tried something with some slighter bigger table (using the code provided by Hart Radev) and I've microbenchmark ed them, maybe it could be helpful:

library(dplyr)
library(microbenchmark)

DT1 = data.frame(A = rep('A', 300000), B = rep('B', 300000))
DT2 = data.frame(C = rep('C', 300000), D = rep('D', 300000))

microbenchmark(
bind_cols = {bind_cols(DT1, DT2)},
cbind = {cbind(DT1,DT2)},
# Hart solution
merge = { DT1$rowname = rownames(DT1) 
          DT2$rowname = rownames(DT2)
          DT3 = merge(DT1, DT2, by = 'rowname')}
)

Unit: microseconds
           expr         min           lq         mean       median           uq         max neval
      bind_cols      72.534      88.9610 1.640497e+02     169.6010     209.4940     348.160   100
          cbind      42.241      50.5610 8.019269e+01      61.4405     114.9875     250.455   100
          merge 2142101.821 2256677.2310 2.574166e+06 2416274.7380 2732207.2465 5956733.422   100

data.table is not my cup of tea but I suppose it could be helpful have a solution with it.

building on @s_t answer, here is a benchmark of an update join using data.table

DT1 = data.frame(A = rep('A', 300000), B = rep('B', 300000))
DT2 = data.frame(C = rep('C', 300000), D = rep('D', 300000))

library(data.table)
setDT(DT1)
setDT(DT2)


microbenchmark::microbenchmark(
  cbind = {
    dt1    <-copy(DT1)
    dt2    <-copy(DT2)
    result <- cbind(DT1, DT2)
  },
  update_join = {
    dt1    <-copy(DT1)
    dt2    <-copy(DT2)
    dt1[, id := .I][ dt2[, id := .I], c("C", "D") := .(i.C, i.D), on = .(id)][, id := NULL]
  } )

# Unit: milliseconds
#        expr     min       lq      mean   median       uq      max neval
# cbind        1.8889  2.68405  9.454567  2.99505  3.62625 226.4432   100
# update_join 23.9186 24.67530 36.957518 25.62405 36.42760 249.3631   100

cbind() still wins by a landslide...

If you want to try it by merging, just add rownames as a column, and do the merge:

DT1 = data.frame(A = rep('A', 3), B = rep('B', 3))
DT1$rowname = rownames(DT1)
DT2 = data.frame(C = rep('C', 3), D = rep('D', 3))
DT2$rowname = rownames(DT2)
DT3 = merge(DT1, DT2, by = 'rowname')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM