简体   繁体   中英

subsetting on unique combination of column strings

I have a data on the best performers at the last date of a larger dataset. Next, I would like to subset the whole dataset to retrieve all data of those best performers. "best performer" is a combination of two strings. However, thus far I have not been able to correctly subset the data.

I have tried to use %in%, which does part of the job, but it includes all rows with one or the other string variable and not the unique combination of the both of them.

library(data.table)
best = data.table(Date = as.Date(c("2016-01-01", "2016-01-01")), x = c("a", "b"), y = c("p", "q"))
wholedt = data.table(Date = as.Date(c("2015-12-01","2015-12-01","2015-12-01","2016-01-01", "2016-01-01", "2016-01-01")), x = c("a", "c", "b", "a","a", "b"), y = c("p", "q", "q", "q","p", "q"))
SDbest_of_whole = wholedt[with(wholedt, x %in% best$x & y %in% best$y)]

The expected output would include all data points witth the combination of (a,p) and (b,q). No combination of (a,q) or (b,p)

expected_output = data.table(Date = as.Date(c("2015-12-01","2015-12-01","2016-01-01", "2016-01-01")), x = c("a", "b","a", "b"), y = c("p", "q","p", "q"))
> expected_output
     Date x y
1: 2015-12-01 a p
2: 2015-12-01 b q
3: 2016-01-01 a p
4: 2016-01-01 b q

One way to make sure you ONLY use the combinations of interest, is to merge your datasets:

library(data.table)
best = data.table(Date = as.Date(c("2016-01-01", "2016-01-01")), x = c("a", "b"), y = c("p", "q"))
wholedt = data.table(Date = as.Date(c("2015-12-01","2015-12-01","2015-12-01","2016-01-01", "2016-01-01", "2016-01-01")), x = c("a", "c", "b", "a","a", "b"), y = c("p", "q", "q", "q","p", "q"))

best[,Date:=NULL]
merge(best, wholedt)

#    x y       Date
# 1: a p 2015-12-01
# 2: a p 2016-01-01
# 3: b q 2015-12-01
# 4: b q 2016-01-01

对于wholedt每一行,您要比较是否有best行是相同的。

SDbest_of_whole <- wholedt[apply(wholedt[,c('x', 'y')], 1, function(w) any(apply(best[,c('x', 'y')], 1, identical, w))),]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM