[英]Subset a data.table given the row index for each group
This seems like a trivial question that I can't seem to find a solution for: 这似乎是一个微不足道的问题,我似乎无法找到解决方案:
Consider the two data.tables
考虑两个
data.tables
library(data.table)
dt <- data.table(id = c(1,1,1,2,2,2),
val = c(10,20,30,10,20,30))
dt1 <- data.table(id = c(1,2),
V1 = c(2,1))
How do I subset dt
, where dt1
tells me the row number ( V1
) of the grouped id
I need to subset? 我如何子集
dt
,其中dt1
告诉我需要子集的分组id
的行号( V1
)?
For example, here the result will be 例如,这里的结果将是
# id val
# 1: 1 20
# 2: 2 10
Update 更新
A quick bit of benchmarking on the proposed solutions 对提议的解决方案进行快速基准测试
library(data.table)
s <- 100000
set.seed(123)
dt <- data.table(id = rep(seq(1:s), each=10),
val = rnorm(n = s*10, 0, 1))
dt1 <- data.table(id = seq(1:s),
V1 = sample(1:10, s, replace=T))
library(microbenchmark)
microbenchmark(
akrun = { dt[dt1, on='id'][, .SD[1:.N==V1] ,id] },
david = { dt[dt1, val[i.V1], on = 'id', by = .EACHI] },
symbolix = { dt[, id_seq := seq(1:.N), by=id][dt1, on=c(id_seq = "V1", "id") , nomatch=0] },
times = 5
)
#Unit: milliseconds
# expr min lq mean median uq max neval
# akrun 17809.51370 17887.89037 18005.32357 18043.80279 18130.78978 18154.62118 5
# david 48.17367 53.76436 53.79004 54.69096 55.59657 56.72467 5
#symbolix 507.67312 511.23492 562.59743 571.31160 579.61228 643.15525 5
Another option is to use by = .EACHI
in order to subset val
while joing 另一种选择是使用
by = .EACHI
以便在joing时对val
进行子集化
dt[dt1, val[i.V1], on = 'id', by = .EACHI]
# id V1
# 1: 1 20
# 2: 2 10
If you have more columns there, you could use .SD[i.V1]
instead. 如果您有更多列,则可以使用
.SD[i.V1]
。
As a side note, in data.table v >= 1.9.8 the .SD[val]
operation is scheduled to be fully optimized to use GForce- so hold tight. 作为旁注,在data.table v> =
.SD[val]
操作计划完全优化以使用GForce-所以保持紧密。
One option would be to join
on
'id' and then do the subset 一个选择是
join
on
“身份证”,然后做子集
dt[dt1, on='id'][, .SD[1:.N==V1] ,id][,V1:=NULL][]
# id val
#1: 1 20
#2: 2 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.