简体   繁体   中英

Function to create new variable by multiple conditions using mutate and case_when (R)

I'm trying to create a function that will compare variables 1 and 2 and create a third variable based on whether they match. I need to do this >25 times (for different combinations of variables), which is why I want to create a function instead of just using mutate and case_when.

I'm pretty new to R, so this is mostly cobbled together from other helpful stack overflow posts and miscellaneous tutorials.

Here's what I tried:

determine_match <- function(df, col_a, col_b){


col_a <- enquo(col_a)
  col_b <- enquo(col_b)
  newvar <- paste0(quo_name(col_a), quo_name(col_b))
  df <- df %>% mutate(!!newvar:= case_when(
    !!col_a == '1' & !!col_b =='Yes' ~ 'Match',
    !!col_a == '0' & !! col_b == 'No' ~ 'Match',
    !!col_a == '1' & !!col_b == 'No' ~ 'No Match',
    !!col_a == '0' & !!col_b == 'Yes' ~ 'No Match',
    is.na(!!col_a) | is.na(!!col_b) ~ NA_character_,
    TRUE ~ 'Error'
  )) 
}

And I tested it on this data set:

test1 <- c('1', '0', '1', '1', '0', NA)
test2 <- c('Yes', 'No', 'No,', NA, 'Yes', NA)
id <- c(1,2,3,4,5,6)
testing.df <- data.frame(id, test1, test2)

I'm not getting errors, but when I run the function with a print statement, it only returns the string name for newvar and doesn't change the actual data frame.

I also tried testing.df %>% mutate(testing3 = funs(determine_match(testing.df, testing1, testing2))) and testing3 gives me ~determine_match(testing.df, testing1, testing2)

Not sure if the problem is the function, the attempt to apply, or both.

Hope some kind soul can help, thank you!!

You need to return the result, add return(df) (or even just df ) as the last line of your function.

If you're not worried about input values other than the ones you explicitly mention, ( "0" , "1" , NA for col_a , and "Yes" , "No" , NA for col_b ), you could simplify the condition to this (for some definitions of "simplify"---it's definitely shorter).

determine_match <- function(df, col_a, col_b) {
  col_a <- enquo(col_a)
  col_b <- enquo(col_b)
  newvar <- paste0(quo_name(col_a), quo_name(col_b))
  df <- df %>% mutate(
    !!newvar := 
      c("No Match", "Match")[((!!col_a == '1') == (!!col_b == 'Yes')) + 1]
    )
  return(df)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM