简体   繁体   中英

Retain only one line per factor if it meets criteria R

I have a dataset that looks like this:

ID  week  action
1   1     TRUE
1   1     FALSE
1   2     FALSE 
1   2     FALSE
1   3     FALSE
1   3     TRUE
2   1     FALSE
2   2     TRUE
2   2     FALSE
...

What I'd like to do is retain for each ID and each week within ID, one value of action, with preference to retaining a TRUE if there is one, else a FALSE.

So it would look like this when through:

ID  week  action
1   1     TRUE
1   2     FALSE
1   3     TRUE
2   1     FALSE
2   2     TRUE
...

Try

library(dplyr)
library(tidyr)
df %>% 
   group_by(ID, week)%>% 
   arrange(desc(action)) %>%
   slice(1)
#   ID week action
#1  1    1   TRUE
#2  1    2  FALSE
#3  1    3   TRUE
#4  2    1  FALSE
#5  2    2   TRUE

Or using data.table

 library(data.table)
 setDT(df)[order(action,decreasing=TRUE),
           .SD[1] , by=list(ID, week)][order(ID,week)]
 #   ID week action
 #1:  1    1   TRUE
 #2:  1    2  FALSE
 #3:  1    3   TRUE
 #4:  2    1  FALSE
 #5:  2    2   TRUE

Or using base R similar to the approach used by @Sam Dickson

 aggregate(action~., df, FUN=function(x) sum(x)>0)
 # ID week action
 #1  1    1   TRUE
 #2  2    1  FALSE
 #3  1    2  FALSE
 #4  2    2   TRUE
 #5  1    3   TRUE

Or as inspired from @docendo discimus, a data.table option would be

  setDT(df)[, .SD[which.max(action)], by=list(ID, week)]

data

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), week = c(1L, 
1L, 2L, 2L, 3L, 3L, 1L, 2L, 2L), action = c(TRUE, FALSE, FALSE, 
 FALSE, FALSE, TRUE, FALSE, TRUE, FALSE)), .Names = c("ID", "week", 
 "action"), class = "data.frame", row.names = c(NA, -9L))

I used plyr:

library(plyr)
ddply(df,.(ID,week),summarize,action=sum(action)>0)

Two options which are similar to akrun's asnwer, but not the same, which is why I post them separately:

aggregate(action ~ ID + week, df, max)
#  ID week action
#1  1    1      1   # you can use 1/0s the same way as TRUE/FALSE
#2  2    1      0
#3  1    2      0
#4  2    2      1
#5  1    3      1

library(dplyr)
group_by(df, ID, week) %>% slice(which.max(action))
#Source: local data frame [5 x 3]
#Groups: ID, week
#
#  ID week action
#1  1    1   TRUE
#2  1    2  FALSE
#3  1    3   TRUE
#4  2    1  FALSE
#5  2    2   TRUE

The help page for which.max tells you that it finds the first maximum of a numeric or logical vector, so even if you had several TRUE entries (which are the same as 1 and FALSE are 0), you will simply select the first occurence and return that. You can do the reverse by using which.min .

A base R solution with aggregate and any :

aggregate(action ~ week + ID, df, any)
#   week ID action
# 1    1  1   TRUE
# 2    2  1  FALSE
# 3    3  1   TRUE
# 4    1  2  FALSE
# 5    2  2   TRUE

Another base R solution:

subset(transform(df, action = ave(action, week, ID, FUN = any)), !duplicated(df[-3]))
#   ID week action
# 1  1    1   TRUE
# 3  1    2  FALSE
# 5  1    3   TRUE
# 7  2    1  FALSE
# 8  2    2   TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM