简体   繁体   中英

How to filter in R using multiple OR statments? Dplyr

I tried searching for this but couldn't find what I needed.

This is how my data looks like,

mydata <- data.frame(Chronic = c("Yes", "No", "Yes"),
                      Mental = c("No", "No", "No"),
                      SA = c("No", "No", "Yes"))

> mydata
  Chronic Mental  SA
1     Yes     No  No
2      No     No  No
3     Yes     No Yes

My goal is get the count of rows where any of the column equal Yes. In this case Row 1 & 3 have at least one Yes. Where Row 2 only has No

Is there an easy to do this?

We can use rowSums on a logical matrix and then get the sum of the logical vector to return the count of rows having at least one 'Yes'

sum(rowSums(mydata == 'Yes') > 0)
#[1] 2

Or with tidyverse

library(dplyr)
mydata %>% 
   rowwise %>%
   mutate(Count = + any(c_across(everything()) == 'Yes')) %>%
   ungroup %>% 
   pull(Count) %>%
   sum
#[1] 2

If you want to write out the code (as opposed to using across) you can write the code out using case_when:

mydata %>% 
  mutate(yes_column = case_when(Chronic == 'Yes' | Mental == 'Yes' | SA == 'Yes' ~ 1,
                                TRUE ~ 0)) %>% 
  summarise(total = sum(yes_column))

This creates a binary flag if Yes appears in any of the columns. It's quite useful for seeing the code works ok by each column, particularly to spot if there are data quality problems like 'Yes' or 'yes' or even 'Y'. The | denotes OR and you can use & for AND.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM