简体   繁体   中英

Create a column to indicate the presence of a value in other columns

I have a dataset with symptoms (20+), the symptoms are categorized as Yes/No/Unknown. I would like to create a new column which indicates if the subject ( ID ) has no symptoms (I'm defining this as they have no symptoms with 'Yes').

I've got a sample dataset below and I can create a column as desired but it feels like there must be a better/cleaner way just using dplyr::mutate() rather than the filtering and joining that I'm doing?

library(dplyr)

test <- tibble(
  ID = c(1:10),
  col1 = sample(c("Yes", "No", "Unknown"), 10, replace = TRUE),
  col2 = sample(c("Yes", "No", "Unknown"), 10, replace = TRUE),
  col3 = sample(c("Yes", "No", "Unknown"), 10, replace = TRUE)
)

left_join(test, test %>%
  filter_at(vars(col1:col3), any_vars(. == "Yes")) %>%
  mutate(any_symptoms = "Yes") %>%
  select(ID, any_symptoms),
by = "ID"
) %>%
  mutate(any_symptoms = recode(any_symptoms, .missing = "No"))
#> # A tibble: 10 x 5
#>       ID col1    col2    col3    any_symptoms
#>    <int> <chr>   <chr>   <chr>   <chr>       
#>  1     1 Unknown Unknown Unknown No          
#>  2     2 Unknown No      No      No          
#>  3     3 Yes     Yes     Unknown Yes         
#>  4     4 No      Unknown Unknown No          
#>  5     5 No      No      Unknown No          
#>  6     6 Unknown Yes     Unknown Yes         
#>  7     7 Yes     Unknown Unknown Yes         
#>  8     8 No      No      No      No          
#>  9     9 No      Unknown Unknown No          
#> 10    10 No      No      No      No

Created on 2020-05-29 by the reprex package (v0.3.0)

You can use rowSums to check if you have more than 0 "yes" in a row.

test$any_symptoms <- c('No', 'Yes')[(rowSums(test[-1] == 'Yes') > 0) + 1]

You can also use this in dplyr pipes:

library(dplyr)
test %>% mutate(any_symptoms = c('No', 'Yes')[(rowSums(.[-1] == 'Yes') > 0) + 1])

Or using pmap from purrr

library(purrr)
test %>%
    mutate(any_symptoms = c('No', 'Yes')[pmap_lgl(select(., starts_with('col')), 
                                        ~any(c(...) == 'Yes')) + 1])

This should work:

test %>% 
  left_join(
    test %>% 
      pivot_longer(-ID) %>% 
      group_by(ID) %>% 
      mutate(is_yes = value == "Yes") %>% 
      summarise(any_symptoms = ifelse(sum(is_yes) > 0, "Yes", "No"))
  )

This works, but might be a bit annoying if you have 20 columns:

test %>% mutate(any_symptoms = case_when(grepl("Yes", paste(col1, col2, col3), fixed = TRUE) ~ "Yes", TRUE ~ "No"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM