I have two tables of the type data.table. I want to combine the tables into one table. They look like:
DT1:
1 A B C
2 A B C
3 A B C
DT2:
D E F
D E F
D E F
I want to combine them like:
1 A B C D E F
2 A B C D E F
3 A B C D E F
I don't know how I can do this by merging, because the data tables don't have common column names.
I know this is possible with cbind, but I'm working with a lot of rows, so I prefer a function which is built to work with big data tables.
Could anyone tell me how to do this?
I've tried something with some slighter bigger table (using the code provided by Hart Radev) and I've microbenchmark
ed them, maybe it could be helpful:
library(dplyr)
library(microbenchmark)
DT1 = data.frame(A = rep('A', 300000), B = rep('B', 300000))
DT2 = data.frame(C = rep('C', 300000), D = rep('D', 300000))
microbenchmark(
bind_cols = {bind_cols(DT1, DT2)},
cbind = {cbind(DT1,DT2)},
# Hart solution
merge = { DT1$rowname = rownames(DT1)
DT2$rowname = rownames(DT2)
DT3 = merge(DT1, DT2, by = 'rowname')}
)
Unit: microseconds
expr min lq mean median uq max neval
bind_cols 72.534 88.9610 1.640497e+02 169.6010 209.4940 348.160 100
cbind 42.241 50.5610 8.019269e+01 61.4405 114.9875 250.455 100
merge 2142101.821 2256677.2310 2.574166e+06 2416274.7380 2732207.2465 5956733.422 100
data.table
is not my cup of tea but I suppose it could be helpful have a solution with it.
building on @s_t answer, here is a benchmark of an update join using data.table
DT1 = data.frame(A = rep('A', 300000), B = rep('B', 300000))
DT2 = data.frame(C = rep('C', 300000), D = rep('D', 300000))
library(data.table)
setDT(DT1)
setDT(DT2)
microbenchmark::microbenchmark(
cbind = {
dt1 <-copy(DT1)
dt2 <-copy(DT2)
result <- cbind(DT1, DT2)
},
update_join = {
dt1 <-copy(DT1)
dt2 <-copy(DT2)
dt1[, id := .I][ dt2[, id := .I], c("C", "D") := .(i.C, i.D), on = .(id)][, id := NULL]
} )
# Unit: milliseconds
# expr min lq mean median uq max neval
# cbind 1.8889 2.68405 9.454567 2.99505 3.62625 226.4432 100
# update_join 23.9186 24.67530 36.957518 25.62405 36.42760 249.3631 100
cbind()
still wins by a landslide...
If you want to try it by merging, just add rownames as a column, and do the merge:
DT1 = data.frame(A = rep('A', 3), B = rep('B', 3))
DT1$rowname = rownames(DT1)
DT2 = data.frame(C = rep('C', 3), D = rep('D', 3))
DT2$rowname = rownames(DT2)
DT3 = merge(DT1, DT2, by = 'rowname')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.