简体   繁体   中英

combining join with “or” in data.table package

dt <- data.table(X=rnorm(10),a=rep(0:1,length=10),b=rep(0:1,each=5))
dt
             X a b
1:  0.08848742 0 0
2: -1.36578648 1 0
3: -1.01563937 0 0
4:  0.36562936 1 0
5:  2.04250239 0 0
6:  1.33698124 1 1
7: -1.38358719 0 1
8: -0.14395236 1 1
9: -1.36277622 0 1
10:  0.40818281 1 1    

setkey(dt,a,b)
dt[J(1,1),]

This is a way to get all lines where both a and b are 1. Is there a way to pick those lines where either a or b is 1 ? In other words: to get all lines in dt except for line 1,3 and 5?

I don't think there's a direct way to do an OR operation. However, you can use simple logical equivalence (A OR B) == !(Ac and Bc) to deduce that what you need is !J(0, 0) .

Just do:

dt[!J(0, 0)]

            X a b
1:  0.7768113 0 1
2:  0.2439950 0 1
3: -0.2095353 1 0
4:  2.9267934 1 0
5: -0.1437019 1 1
6:  1.5120883 1 1
7: -0.4462240 1 1

I've been doing this sort of thing lately:

kvals = CJ(a=0:1,b=0:1)
dt[kvals[a|b]]

"kvals" stores all possible values for the key. CJ is the same as expand.grid , as far as I can tell: it takes all combinations of the vectors passed to it.

Why can't you just do that as an ordinary i-selection operation?

> dt[a==1&b==1,]
            X a b
1: -0.1186037 1 1
2: -0.1166594 1 1
3:  0.2622407 1 1
> dt[a==1|b==1,]
             X a b
1: -0.69037968 0 1
2:  1.63492922 0 1
3: -0.09240386 1 0
4:  0.55300691 1 0
5: -0.11860370 1 1
6: -0.11665936 1 1
7:  0.26224070 1 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM