根据列名称合并data.tables

Question

I am trying to do some left-join merges with data.tables. 我正在尝试与data.tables进行一些左联接合并。 The package description quote that 包描述引用

In all joins the names of the columns are irrelevant; 在所有联接中，列的名称都不相关； the columns of x's key are joined to in order x的键的列按顺序连接到

I understand that I can use .data.table[ and data.table:::merge.data.table 我了解可以使用.data.table[和data.table:::merge.data.table

What I would like is : merge X and Y specifying the keys (like by.x and by.y in base merge, ->why taking this away ?) 我想要的是：合并X和Y以指定键（例如基本合并中的by.x和by.y，->为什么要取消此键？）

Let's suppose I have 假设我有

DT = data.table(x=rep(c("a","b","c"),each=3),y=c(1,3,6),v=1:9,key="x,y,v")
DT1 = data.frame(x1=c("aa","bb","cc"),y1=c(1,3,6),v1=1:3,key="x1,y1,v1")

and I would like this output: 我想要这个输出：

#data.table:::merge is masking I don't know how to call the base version of merge anymore
R) {base::merge}(DT,DT1,by.x="y",by.y="y1") 
y x v x1 v1
1 1 a 1 aa  1
2 1 c 7 aa  1
3 1 b 4 aa  1
4 3 a 2 bb  2
5 3 b 5 bb  2
6 3 c 8 bb  2
7 6 b 6 cc  3
8 6 a 3 cc  3
9 6 c 9 cc  3

I am very happy to use [ or data.table:::merge but I would like an option that do not modify DT or DT1 (like changing the column names and calling merge and changing it back) 我很高兴使用[或data.table:::merge但我想要一个不修改DT或DT1的选项（例如更改列名并调用merge并将其更改回）

Answer 1

Update: Since data.table v1.9.6 (released September 19, 2015), merge.data.table() does accept and nicely handles arguments by.x= and by.y= . 更新：自data.table v1.9.6（2015年9月19日发布）以来， merge.data.table()确实接受并很好地处理了by.x=和by.y=参数。 Here's an updated link to the FR (now closed) referenced below. 这是到下面引用的FR（现已关闭）的更新链接。

Yes this is a feature request not yet implemented : 是的，这是尚未实现的功能请求：

FR#2033 Add by.x and by.y to merge.data.table FR＃2033将by.x和by.y添加到merge.data.table

There isn't anything preventing it. 没有什么可以阻止它的。 Just something that wasn't done. 只是没有做的事情。 I very rarely need merge and was slow to realise its usefulness more generally. 我很少需要merge并且很慢地意识到它的用途。 We've made good progress in bringing merge performance as fast as X[Y] , and this feature request is at the highest priority. 在使merge性能达到X[Y]速度方面，我们已经取得了良好的进展，并且此功能请求的优先级最高。 If you'd like it more quickly you are more than welcome to add those arguments to merge.data.table and commit the change yourself. 如果您希望更快地进行操作，不妨欢迎将这些参数添加到merge.data.table并merge.data.table提交更改。 We try to keep source code short and together in one function/file, so by looking at merge.data.table source hopefully you can follow it and see what needs to be done. 我们试图使源代码简短，并在一个函数/文件中保持在一起，因此希望通过查看merge.data.table源，您可以按照源代码进行操作，并查看需要执行的操作。

Answer 2

The arguments by.x and by.y are now available in the development version of data.table . 现在，在data.table的开发版本中可以使用参数by.x和by.y See here . 看这里。 Use devtools::install_github("Rdatatable/data.table", build_vignettes = FALSE) to install the development version of data.table . 使用devtools::install_github("Rdatatable/data.table", build_vignettes = FALSE)安装data.table的开发版本。

Answer 3

You can't because the by columns must be in the intersection of colnames(DT) and colnames(DT1) 您不能这样做，因为by列必须位于colnames（DT）和colnames（DT1）的交集内

 if (!all(by %in% intersect(colnames(x), colnames(y)))) {
       stop("Elements listed in `by` must be valid column names in x and y")
   }

Here using setnames , which which does not copy and is very fast 在这里使用setnames，它不会复制并且非常快

setnames(DT1,'y1','y')
> merge(DT,DT1)
   y x v x1 v1
1: 1 a 1 aa  1
2: 1 b 4 aa  1
3: 1 c 7 aa  1
4: 3 a 2 bb  2
5: 3 b 5 bb  2
6: 3 c 8 bb  2
7: 6 a 3 cc  3
8: 6 b 6 cc  3
9: 6 c 9 cc  3

EDIT update with data.table version data.table 1.9.4 使用data.table版本data.table 1.9.4进行EDIT更新

you should set the by parameter otherwise you get an error: 您应该设置by参数，否则会出现错误：

Error in merge.data.table(DT, as.data.table(DT1)) : 
  Elements listed in `by` must be valid column names in x and y

You should do something like : 您应该执行以下操作：

merge(DT,DT1,by="y")

根据列名称合并data.tables

问题描述

3 个解决方案

解决方案1
8 已采纳 2012-12-28 16:18:45

解决方案2
5 2015-05-30 06:23:34

解决方案3
4 2012-12-28 13:19:08

EDIT update with data.table version data.table 1.9.4 使用data.table版本data.table 1.9.4进行EDIT更新

根据列名称合并data.tables

问题描述

3 个解决方案

解决方案1 8 已采纳 2012-12-28 16:18:45

解决方案2 5 2015-05-30 06:23:34

解决方案3 4 2012-12-28 13:19:08

EDIT update with data.table version data.table 1.9.4 使用data.table版本data.table 1.9.4进行EDIT更新

解决方案1
8 已采纳 2012-12-28 16:18:45

解决方案2
5 2015-05-30 06:23:34

解决方案3
4 2012-12-28 13:19:08