[英]Chaining multiple data.table::merge operations with data.tables
Is it possible to chain multiple merge operations one after another with data.tables
?是否可以使用data.tables
一个接一个地链接多个合并操作?
The functionality would be similar to joining multiple d ata.frames
in a dplyr
pipe but would be used for data.tables
in a similar chained fashion as merging two data.tables
in the below and then manipulating the data.table
as required.该功能类似于在dplyr
ata.frames
中加入多个 data.frames,但将以类似的链接方式用于data.tables
,如在下面合并两个data.tables
,然后根据需要操作data.table
。 But only you would be then able to merge another data.table
.但是只有你才能合并另一个data.table
。 I am acknowledging this SO question here may be very similar, that is after @chinsoon12 posted the comment.我承认这里的这个 SO 问题可能非常相似,那是在 @chinsoon12 发表评论之后。
Thanks for any help!谢谢你的帮助!
library(dplyr)
library(data.table)
# data.frame
df1 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
quantity = c(1:4))
df2 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
status = c("good", "bad", "rotten", "raw"))
df3 = data.frame(food = c("apples", "bananas", "carrots", "dates"),
rank = c("okay", "good", "better", "best"))
df4 = left_join(df1,
df2,
by = "food") %>%
mutate(new_col = NA) %>% # this is just to hold a position of mutation in the data.frame
left_join(.,
df3,
by = "food")
# data.table
dt1 = data.table(food = c("apples", "bananas", "carrots", "dates"),
quantity = c(1:4))
dt2 = data.table(food = c("apples", "bananas", "carrots", "dates"),
status = c("good", "bad", "rotten", "raw"))
dt3 = data.table(food = c("apples", "bananas", "carrots", "dates"),
rank = c("okay", "good", "better", "best"))
# this is what I am not sure how to implement
dt4 = merge(dt1,
dt2,
by = "food")[
food == "apples"](merge(dt4))
Multiple data.table joins with the on
argument can be chained. 可以链接带有on
参数的多个data.table连接。 Note that without an update operator (":=") in j, this would be a right join, but with ":=" (ie, adding columns), this becomes a left outer join. 请注意,如果j中没有更新运算符(“:=”),这将是一个右连接,但是使用“:=”(即添加列),它将成为一个左外部连接。 A useful post on left joins here Left join using data.table . 有用的左连接在这里使用data.table左连接 。
Example using example data above with a subset between joins: 使用上面的示例数据以及联接之间的子集的示例:
dt4 <- dt1[dt2, on="food", `:=`(status = i.status)][
food == "apples"][dt3, on="food", rank := i.rank]
##> dt4
## food quantity status rank
##1: apples 1 good okay
Example adding new column between joins 在联接之间添加新列的示例
dt4 <- dt1[dt2, on="food", `:=`(status = i.status)][
, new_col := NA][dt3, on="food", rank := i.rank]
##> dt4
## food quantity status new_col rank
##1: apples 1 good NA okay
##2: bananas 2 bad NA good
##3: carrots 3 rotten NA better
##4: dates 4 raw NA best
Example using merge
and magrittr pipes: 使用merge
和magrittr管道的示例:
dt4 <- merge(dt1, dt2, by = "food") %>%
set( , "new_col", NA) %>%
merge(dt3, by = "food")
##> dt4
## food quantity status new_col rank
##1: apples 1 good NA okay
##2: bananas 2 bad NA good
##3: carrots 3 rotten NA better
##4: dates 4 raw NA best
See no other way than this (unfortunately). You need to define vectors with column names and then You may chain joining by reference like this:
cols_dt1 <- colnames(dt_dt1)[!colnames(dt_dt1) %in% 'join_column1']
cols_dt2 <- colnames(dt_dt2)[!colnames(dt_dt2) %in% ' join_column2']
cols_dt3 <- colnames(dt_dt3)[!colnames(dt_dt3) %in% ' join_column3']
cols_dt4 <- colnames(dt_dt4)[!colnames(dt_dt4) %in% ' join_column4']
cols_dt5 <- colnames(dt_dt5)[!colnames(dt_dt5) %in% ' join_column5']
data_dt[dt_dt1, on=.( join_column1), (cols_dt1) := mget(cols_dt1)][
dt_dt2, on=.( join_column2), (cols_dt2) := mget(cols_dt2)][
dt_dt3, on=.( join_column3), (cols_dt3) := mget(cols_dt3)][
dt_dt4, on=.( join_column4), (cols_dt4) := mget(cols_dt4)][
dt_dt5, on=.( join_column5), (cols_dt5) := mget(cols_dt5)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.