简体   繁体   English

使用data.table R进行复杂过滤

[英]Complex filtering with data.table R

I am trying to select information by different group in a data.frame (or data.table ), but didn't find the proper way of doing it. 我试图在data.frame(或data.table )中按不同的组选择信息,但是没有找到正确的方法。 Consider the following example: 考虑以下示例:

DF <- data.table(value=c(seq(5,1,-1),c(5,5,3,2,1)),group=rep(c("A","B"),each=5),status=rep(c("D","A","A","A","A"),2))

   value group status
 1:     5     A      D
 2:     4     A      A
 3:     3     A      A
 4:     2     A      A
 5:     1     A      A
 6:     5     B      D
 7:     5     B      A
 8:     3     B      A
 9:     2     B      A
10:     1     B      A

I'd like now to get the max value by group when the status is alive ("A"). 我现在想在状态为活动状态(“ A”)时按组获取最大值。 I have tried this: 我已经试过了:

DF[,.I[value==max(value[status!="D"])],by=group]

       group V1
1:     A  2
2:     B  6
3:     B  7

But the 6th row is status "D" (dead) and I'd like to avoid that row. 但是第六行的状态为“ D”(已死),我想避免该行。 I can't subset the data like this: 我不能像这样子集数据:

DF[status!="D",.I[value==max(value[status!="D"])],by=group]

as I need to compute different stats by groups, such as (doesn't work): 因为我需要按组计算不同的统计信息,例如(不起作用):

  DF[,list("max"=max(value[status!="D"],na.rm=T),"group"=group[.I[value==max(value[status=="D"],na.rm=T)]]),by=group]]

Any hint would be greatly appreciated! 任何提示将不胜感激!

If we need an index based on 'status' that are not 'D' and 'value' is max of 'value' grouped by 'group' 如果我们需要基于不是“ D”的“状态”的索引,并且“值”是按“组”分组的“值”的max

i1 <- DF[status != "D", .I[value == max(value)], by = group]$V1

Use the index for further summarizing 使用索引进行进一步汇总

DF[i1, .SD[value == max(value)], group]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM