dat <- data.frame(ID = c(1, 2, 2, 2), Gender = c("Both", "Both", "Male", "Female"))
> dat
ID Gender
1 1 Both
2 2 Both
3 2 Male
4 2 Female
For each ID, if the Gender is Both
, Male
, and Female
, I want to remove the row with Both
. That is, my desired data is this:
ID Gender
1 1 Both
2 2 Male
3 2 Female
I've tried to do this by using the code below:
library(dplyr)
> dat %>%
group_by(ID) %>%
mutate(A = ifelse(length(unique(Gender)) >= 3 & Gender == 'Both', F, T)) %>%
filter(A) %>%
select(-A)
# A tibble: 2 x 2
# Groups: ID [1]
ID Gender
<dbl> <fctr>
1 2 Male
2 2 Female
I'm declaring a dummy variable called A
, where A = F
if for a given ID
, all 3 elements of Gender
are present ("Both", "Male", and "Female"; these are the different values that Gender
can take, no other value is possible) and the corresponding row has Gender == Both
. Then I will remove that row.
However, it seems like I'm assigning A = F
to the first row, even though its Gender
is only "Both", but not "Both", "Male", and "Female"?
After grouping by 'ID', create a logical condition where the 'Gender' is not 'Both' and the length of distinct
elements in 'Gender' is 3 ie 'Male', 'Female', 'Both' (as the OP mentioned there is no other values) or ( |
) if the number of elements is only 1
dat %>%
group_by(ID) %>%
filter((Gender != "Both" & n_distinct(Gender)==3)| n() ==1 )
# A tibble: 3 x 2
# Groups: ID [2]
# ID Gender
# <dbl> <fct>
#1 1 Both
#2 2 Male
#3 2 Female
Or another option is
dat %>%
group_by(ID) %>%
filter(Gender %in% c("Male", "Female")| n() == 1)
# A tibble: 3 x 2
# Groups: ID [2]
# ID Gender
# <dbl> <fct>
#1 1 Both
#2 2 Male
#3 2 Female
From base R , using ave
dat[!(ave(dat$Gender,dat$ID,FUN=function(x) length(unique(x)))!='1'&(dat$Gender=='Both')),]
ID Gender
1 1 Both
3 2 Male
4 2 Female
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.