[英]Using a merge to define a subset of a data.table in R
I'm using multiple merges to define an ID variable in R (see this question for more context). 我正在使用多个合并在R中定义一个ID变量(有关更多上下文,请参阅此问题)。
I want to merge variable v
from data.table
x
to data.table
y
first according to, say, keys k1
in y
. 我想首先根据data.table
y
键k1
将变量v
从data.table
x
合并到data.table
y
。
Then for those observations that weren't matched in the first stage, I want to merge them according to table y
keys k2
. 然后对于那些在第一阶段不匹配的观察,我想根据表y
键k2
合并它们。
y[is.na(v),x,v:=v]
doesn't work, as data.table
syntax expects a data.table
first when merging. 不起作用,因为合并时data.table
语法首先需要data.table
。
y[is.na(v),][x,v:=v]
works in a sense, but doesn't save the results of the merge to y
. 从某种意义上讲可以工作,但不会将合并结果保存到y
。
Here's a minimal example: 这是一个最小的示例:
x<-data.table(v1=c("A","B","C"),v2=c("a","b","c"),v=rnorm(3),key=c("v1","v2"))
y<-data.table(v1=c("A","B","C"),v21=c("","b","c"),v22=c("a","",""))
setkey(y,v1,v21)
y[x,v:=v]
gives 给
> x
v1 v2 v
1: A a 0.3316665
2: B b 0.8470424
3: C c -0.5955292
> y
v1 v21 v22 v
1: A a NA
2: B b 0.8470424
3: C c -0.5955292
And of course what I want is: 当然,我想要的是:
> y
v1 v21 v22 v
1: A a 0.3316665
2: B b 0.8470424
3: C c -0.5955292
Try this: 尝试这个:
setkey(y, v1, v22)
y[x, v := ifelse(is.na(v), i.v, v)]
i.
can be used to distinguish same column name from the i-expression
data.table
. 可用于从i-expression
data.table
区分相同的列名。
An alternative to @eddi's solution that I'm working with that is somewhat more robust: 我正在使用的@eddi解决方案的替代方案更加健壮:
setkey(y, v1, v22)
y[x[!(v %in% y$v),],v:=i.v]
(basically, instead of subsetting y
, subset x
via y
and join the subsetted x
to y
. (基本上,代替子集划分y
,子集x
经由y
并加入子集化x
到y
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.