I have a dataset that looks like this:
ID week action
1 1 TRUE
1 1 FALSE
1 2 FALSE
1 2 FALSE
1 3 FALSE
1 3 TRUE
2 1 FALSE
2 2 TRUE
2 2 FALSE
...
What I'd like to do is retain for each ID and each week within ID, one value of action, with preference to retaining a TRUE if there is one, else a FALSE.
So it would look like this when through:
ID week action
1 1 TRUE
1 2 FALSE
1 3 TRUE
2 1 FALSE
2 2 TRUE
...
Try
library(dplyr)
library(tidyr)
df %>%
group_by(ID, week)%>%
arrange(desc(action)) %>%
slice(1)
# ID week action
#1 1 1 TRUE
#2 1 2 FALSE
#3 1 3 TRUE
#4 2 1 FALSE
#5 2 2 TRUE
Or using data.table
library(data.table)
setDT(df)[order(action,decreasing=TRUE),
.SD[1] , by=list(ID, week)][order(ID,week)]
# ID week action
#1: 1 1 TRUE
#2: 1 2 FALSE
#3: 1 3 TRUE
#4: 2 1 FALSE
#5: 2 2 TRUE
Or using base R
similar to the approach used by @Sam Dickson
aggregate(action~., df, FUN=function(x) sum(x)>0)
# ID week action
#1 1 1 TRUE
#2 2 1 FALSE
#3 1 2 FALSE
#4 2 2 TRUE
#5 1 3 TRUE
Or as inspired from @docendo discimus, a data.table option would be
setDT(df)[, .SD[which.max(action)], by=list(ID, week)]
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), week = c(1L,
1L, 2L, 2L, 3L, 3L, 1L, 2L, 2L), action = c(TRUE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, TRUE, FALSE)), .Names = c("ID", "week",
"action"), class = "data.frame", row.names = c(NA, -9L))
I used plyr:
library(plyr)
ddply(df,.(ID,week),summarize,action=sum(action)>0)
Two options which are similar to akrun's asnwer, but not the same, which is why I post them separately:
aggregate(action ~ ID + week, df, max)
# ID week action
#1 1 1 1 # you can use 1/0s the same way as TRUE/FALSE
#2 2 1 0
#3 1 2 0
#4 2 2 1
#5 1 3 1
library(dplyr)
group_by(df, ID, week) %>% slice(which.max(action))
#Source: local data frame [5 x 3]
#Groups: ID, week
#
# ID week action
#1 1 1 TRUE
#2 1 2 FALSE
#3 1 3 TRUE
#4 2 1 FALSE
#5 2 2 TRUE
The help page for which.max
tells you that it finds the first maximum of a numeric or logical vector, so even if you had several TRUE entries (which are the same as 1 and FALSE are 0), you will simply select the first occurence and return that. You can do the reverse by using which.min
.
A base R solution with aggregate
and any
:
aggregate(action ~ week + ID, df, any)
# week ID action
# 1 1 1 TRUE
# 2 2 1 FALSE
# 3 3 1 TRUE
# 4 1 2 FALSE
# 5 2 2 TRUE
Another base R solution:
subset(transform(df, action = ave(action, week, ID, FUN = any)), !duplicated(df[-3]))
# ID week action
# 1 1 1 TRUE
# 3 1 2 FALSE
# 5 1 3 TRUE
# 7 2 1 FALSE
# 8 2 2 TRUE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.