简体   繁体   中英

Setting key while chaining in R data.table

Imagine I have a data.table DT that has columns a, b, c . I want to filter rows based on a (say, select only those with value "A" ), compute the sum of b by c . I can do this efficiently, using binary search for filtering, by

setkey(DT, a)
DT[.("A"), .(sum.B = sum(B)), by = .(C)]

What if then I want to filter rows based on the value of the newly obtained sum.b ? If I want to keep rows where sum.b equals one of c(3, 4, 5) , I can do that by saying

DT[.("A"), .(sum.B = sum(B)), by = .(C)][sum.b %in% c(3, 4, 5)]

but the latter operation uses vector scan which is slow. Is there a way to set keys "on the fly" while chaining? Ideally I would have

DT[.("A"), .(sum.B = sum(B)), by = .(C)][??set sum.b as key??][.(c(3, 4, 5))]

where I don't know the middle step.

The middle step you are asking in the question would be the following:

# unnamed args
DT[,.SD,,sum.b]
# named args
DT[j = .SD, keyby = sum.b]
# semi named
DT[, .SD, keyby = sum.b]

Yet you should benchmark it on your data as it may be slower than vector scan as you need to setkey.

It looks like eddi already provide that solution in comment. The FR mentioned by him is data.table#1105 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM