简体   繁体   English

在R中使用合并定义data.table的子集

[英]Using a merge to define a subset of a data.table in R

I'm using multiple merges to define an ID variable in R (see this question for more context). 我正在使用多个合并在R中定义一个ID变量(有关更多上下文,请参阅问题)。

I want to merge variable v from data.table x to data.table y first according to, say, keys k1 in y . 我想首先根据data.table yk1将变量vdata.table x合并到data.table y

Then for those observations that weren't matched in the first stage, I want to merge them according to table y keys k2 . 然后对于那些在第一阶段不匹配的观察,我想根据表yk2合并它们。

y[is.na(v),x,v:=v]

doesn't work, as data.table syntax expects a data.table first when merging. 不起作用,因为合并时data.table语法首先需要data.table

y[is.na(v),][x,v:=v]

works in a sense, but doesn't save the results of the merge to y . 从某种意义上讲可以工作,但不会将合并结果保存到y

Here's a minimal example: 这是一个最小的示例:

x<-data.table(v1=c("A","B","C"),v2=c("a","b","c"),v=rnorm(3),key=c("v1","v2"))
y<-data.table(v1=c("A","B","C"),v21=c("","b","c"),v22=c("a","",""))
setkey(y,v1,v21)
y[x,v:=v]

gives

> x
   v1 v2          v
1:  A  a  0.3316665
2:  B  b  0.8470424
3:  C  c -0.5955292
> y
   v1 v21 v22          v
1:  A       a         NA
2:  B   b      0.8470424
3:  C   c     -0.5955292

And of course what I want is: 当然,我想要的是:

> y
   v1 v21 v22          v
1:  A       a  0.3316665
2:  B   b      0.8470424
3:  C   c     -0.5955292

Try this: 尝试这个:

setkey(y, v1, v22)
y[x, v := ifelse(is.na(v), i.v, v)]

i. can be used to distinguish same column name from the i-expression data.table . 可用于从i-expression data.table区分相同的列名。

An alternative to @eddi's solution that I'm working with that is somewhat more robust: 我正在使用的@eddi解决方案的替代方案更加健壮:

setkey(y, v1, v22)
y[x[!(v %in% y$v),],v:=i.v]

(basically, instead of subsetting y , subset x via y and join the subsetted x to y . (基本上,代替子集划分y ,子集x经由y并加入子集化xy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM