简体   繁体   中英

Complex filtering with data.table R

I am trying to select information by different group in a data.frame (or data.table ), but didn't find the proper way of doing it. Consider the following example:

DF <- data.table(value=c(seq(5,1,-1),c(5,5,3,2,1)),group=rep(c("A","B"),each=5),status=rep(c("D","A","A","A","A"),2))

   value group status
 1:     5     A      D
 2:     4     A      A
 3:     3     A      A
 4:     2     A      A
 5:     1     A      A
 6:     5     B      D
 7:     5     B      A
 8:     3     B      A
 9:     2     B      A
10:     1     B      A

I'd like now to get the max value by group when the status is alive ("A"). I have tried this:

DF[,.I[value==max(value[status!="D"])],by=group]

       group V1
1:     A  2
2:     B  6
3:     B  7

But the 6th row is status "D" (dead) and I'd like to avoid that row. I can't subset the data like this:

DF[status!="D",.I[value==max(value[status!="D"])],by=group]

as I need to compute different stats by groups, such as (doesn't work):

  DF[,list("max"=max(value[status!="D"],na.rm=T),"group"=group[.I[value==max(value[status=="D"],na.rm=T)]]),by=group]]

Any hint would be greatly appreciated!

If we need an index based on 'status' that are not 'D' and 'value' is max of 'value' grouped by 'group'

i1 <- DF[status != "D", .I[value == max(value)], by = group]$V1

Use the index for further summarizing

DF[i1, .SD[value == max(value)], group]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM