在R中使用合并定义data.table的子集

Question

I'm using multiple merges to define an ID variable in R (see this question for more context). 我正在使用多个合并在R中定义一个ID变量（有关更多上下文，请参阅此问题）。

I want to merge variable v from data.table x to data.table y first according to, say, keys k1 in y . 我想首先根据data.table y键k1将变量v从data.table x合并到data.table y 。

Then for those observations that weren't matched in the first stage, I want to merge them according to table y keys k2 . 然后对于那些在第一阶段不匹配的观察，我想根据表y键k2合并它们。

y[is.na(v),x,v:=v]

doesn't work, as data.table syntax expects a data.table first when merging. 不起作用，因为合并时data.table语法首先需要data.table 。

y[is.na(v),][x,v:=v]

works in a sense, but doesn't save the results of the merge to y . 从某种意义上讲可以工作，但不会将合并结果保存到y 。

Here's a minimal example: 这是一个最小的示例：

x<-data.table(v1=c("A","B","C"),v2=c("a","b","c"),v=rnorm(3),key=c("v1","v2"))
y<-data.table(v1=c("A","B","C"),v21=c("","b","c"),v22=c("a","",""))
setkey(y,v1,v21)
y[x,v:=v]

gives 给

> x
   v1 v2          v
1:  A  a  0.3316665
2:  B  b  0.8470424
3:  C  c -0.5955292
> y
   v1 v21 v22          v
1:  A       a         NA
2:  B   b      0.8470424
3:  C   c     -0.5955292

And of course what I want is: 当然，我想要的是：

> y
   v1 v21 v22          v
1:  A       a  0.3316665
2:  B   b      0.8470424
3:  C   c     -0.5955292

Answer 1

Try this: 尝试这个：

setkey(y, v1, v22)
y[x, v := ifelse(is.na(v), i.v, v)]

i. can be used to distinguish same column name from the i-expression data.table . 可用于从i-expression data.table区分相同的列名。

Answer 2

An alternative to @eddi's solution that I'm working with that is somewhat more robust: 我正在使用的@eddi解决方案的替代方案更加健壮：

setkey(y, v1, v22)
y[x[!(v %in% y$v),],v:=i.v]

(basically, instead of subsetting y , subset x via y and join the subsetted x to y . （基本上，代替子集划分y ，子集x经由y并加入子集化x到y 。

在R中使用合并定义data.table的子集

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-03-23 20:36:38

解决方案2
1 2015-03-24 14:53:20

在R中使用合并定义data.table的子集

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-03-23 20:36:38

解决方案2 1 2015-03-24 14:53:20

解决方案1
3 已采纳 2015-03-23 20:36:38

解决方案2
1 2015-03-24 14:53:20