I have a dataframe where I would like to identify cases (rows) where a given condition is met at least a certain number of times in a set of columns. In the toy example below, I would like to identify cases where "A" is the choice for two of three columns (Choice_1 to Choice_3). I do not care in which two of the three columns "A" is found. In my example, ID = 1 and ID = 4 would be identified.
This should work with any number of "A"s desired in any number of columns (eg if I wanted to identify cases where "A" is the choice in three of the four Choice columns, only ID = 1 would be identified).
ID <- 1:4
Choice_1 <- c("A", "B", "C", "D")
Choice_2 <- c("A", "D", "C", "A")
Choice_3 <- c("A", "C", "A", "A")
Choice_4 <- c("B", "B", "A", "B")
df <- data.frame(ID, Choice_1, Choice_2, Choice_3, Choice_4)
> df
ID Choice_1 Choice_2 Choice_3 Choice_4
1 A A A B
2 B D C B
3 C C A A
4 D A A B
One kind of roundabout way to do this would be to convert "A"s to 1 and all else to 0, sum the Choice columns I am interested in and check the sum is equal or higher than my threshold, but I feel like there must be a better way.
The way I imagine it, it would be some form of if_else statement included in a mutate so rows that match the condition would be identified with 1 and those that don't with 0:
df %>% mutate(cond_matched = if_else( two of (Choice_1, Choice_2, Choice_3) == "A", 1, 0))
ID Choice_1 Choice_2 Choice_3 Choice_4 cond_matched
1 A A A B 1
2 B D C B 0
3 C C A A 0
4 D A A B 1
I'm hoping I've just been searching with the wrong keywords. Thank you for any help!
A base R option wuld be to create logical matrix from selected columns ( df[2:4] == "A"
), get the row wise sum of TRUE elements and check if it is greater than or equal to 2, coerce the logical vector to binary with as.integer
or +
(hacky)
df$cond_matched <- +(rowSums(df[2:4] == "A") >= 2)
df$cond_matched
#[1] 1 0 0 1
Or with tidyverse
(with a similar logic from base R solution, but not exactly the same syntax)
library(tidyverse)
df %>%
mutate(cond_matched = select(., 2:4) %>%
map(~ .x == 'A') %>%
reduce(`+`) %>%
`>=`(2) %>%
as.integer)
# ID Choice_1 Choice_2 Choice_3 Choice_4 cond_matched
#1 1 A A A B 1
#2 2 B D C B 0
#3 3 C C A A 0
#4 4 D A A B 1
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val, -c(ID, Choice_4)) %>%
group_by(ID) %>%
summarise(cond_matched = as.integer(sum(val == "A") >= 2)) %>%
ungroup() %>%
left_join(df, by = c("ID" = "ID"))
ID cond_matched Choice_1 Choice_2 Choice_3 Choice_4
<int> <int> <chr> <chr> <chr> <chr>
1 1 1 A A A B
2 2 0 B D C B
3 3 0 C C A A
4 4 1 D A A B
Or with just dplyr
(using basically the same logic as @akrun):
df %>%
mutate(cond_matched = as.integer(rowSums(.[-ncol(.)] == "A") >= 2))
To name the columns explicitly:
df %>%
mutate(cond_matched = as.integer(rowSums(.[grepl("Choice_1|Choice_2|Choice_3", colnames(.))] == "A") >= 2))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.