简体   繁体   English

链接多个 data.table::merge 操作与 data.tables

[英]Chaining multiple data.table::merge operations with data.tables

Is it possible to chain multiple merge operations one after another with data.tables ?是否可以使用data.tables一个接一个地链接多个合并操作?

The functionality would be similar to joining multiple d ata.frames in a dplyr pipe but would be used for data.tables in a similar chained fashion as merging two data.tables in the below and then manipulating the data.table as required.该功能类似于在dplyr ata.frames中加入多个 data.frames,但将以类似的链接方式用于data.tables ,如在下面合并两个data.tables ,然后根据需要操作data.table But only you would be then able to merge another data.table .但是只有你才能合并另一个data.table I am acknowledging this SO question here may be very similar, that is after @chinsoon12 posted the comment.我承认这里的这个 SO 问题可能非常相似,那是在 @chinsoon12 发表评论之后。

Thanks for any help!谢谢你的帮助!

library(dplyr)
library(data.table)

# data.frame
df1 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
                 quantity = c(1:4))

df2 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
                 status = c("good", "bad", "rotten", "raw"))

df3 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
                 rank = c("okay", "good", "better", "best"))

df4 = left_join(df1,
                df2,
                by = "food") %>% 
  mutate(new_col = NA) %>%  # this is just to hold a position of mutation in the data.frame
  left_join(.,
            df3,
            by = "food")



# data.table
dt1 = data.table(food = c("apples", "bananas", "carrots", "dates"),
                 quantity = c(1:4))

dt2 = data.table(food = c("apples", "bananas", "carrots", "dates"),
                 status = c("good", "bad", "rotten", "raw"))

dt3 = data.table(food = c("apples", "bananas", "carrots", "dates"),
                 rank = c("okay", "good", "better", "best"))

# this is what I am not sure how to implement
dt4 = merge(dt1,
            dt2,
            by = "food")[
              food == "apples"](merge(dt4))

Multiple data.table joins with the on argument can be chained. 可以链接带有on参数的多个data.table连接。 Note that without an update operator (":=") in j, this would be a right join, but with ":=" (ie, adding columns), this becomes a left outer join. 请注意,如果j中没有更新运算符(“:=”),这将是一个右连接,但是使用“:=”(即添加列),它将成为一个左外部连接。 A useful post on left joins here Left join using data.table . 有用的左连接在这里使用data.table左连接

Example using example data above with a subset between joins: 使用上面的示例数据以及联接之间的子集的示例:

dt4 <- dt1[dt2, on="food", `:=`(status = i.status)][
            food == "apples"][dt3, on="food", rank := i.rank]

##> dt4
## food quantity status rank
##1: apples        1   good okay

Example adding new column between joins 在联接之间添加新列的示例

dt4 <- dt1[dt2, on="food", `:=`(status = i.status)][
            , new_col := NA][dt3, on="food", rank := i.rank]

##> dt4
##      food quantity status new_col   rank
##1:  apples        1   good      NA   okay
##2: bananas        2    bad      NA   good
##3: carrots        3 rotten      NA better
##4:   dates        4    raw      NA   best

Example using merge and magrittr pipes: 使用merge和magrittr管道的示例:

dt4 <-  merge(dt1, dt2, by = "food") %>%
           set( , "new_col", NA) %>% 
             merge(dt3, by = "food")

##> dt4
##      food quantity status new_col   rank
##1:  apples        1   good      NA   okay
##2: bananas        2    bad      NA   good
##3: carrots        3 rotten      NA better
##4:   dates        4    raw      NA   best
See no other way than this (unfortunately). You need to define vectors with column names and then You may chain joining by reference like this: 

cols_dt1 <- colnames(dt_dt1)[!colnames(dt_dt1) %in% 'join_column1']
cols_dt2 <- colnames(dt_dt2)[!colnames(dt_dt2) %in% ' join_column2']
cols_dt3 <- colnames(dt_dt3)[!colnames(dt_dt3) %in% ' join_column3']
cols_dt4 <- colnames(dt_dt4)[!colnames(dt_dt4) %in% ' join_column4']
cols_dt5 <- colnames(dt_dt5)[!colnames(dt_dt5) %in% ' join_column5']

data_dt[dt_dt1, on=.( join_column1), (cols_dt1) := mget(cols_dt1)][
  dt_dt2, on=.( join_column2), (cols_dt2) := mget(cols_dt2)][
    dt_dt3, on=.( join_column3), (cols_dt3) := mget(cols_dt3)][
      dt_dt4, on=.( join_column4), (cols_dt4) := mget(cols_dt4)][
        dt_dt5, on=.( join_column5), (cols_dt5) := mget(cols_dt5)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM