简体   繁体   中英

Dplyr - Using case_when (multiple conditions) with across

I'm trying to recode a large number of variables with 5 levels ("1_Disagree", "2_SomeD", "3_Neither", "4_SomeA", "5_Agree") into variables with 3 levels ("1_Disagree", "2_Neither", "3_Agree"). All these variables have similar names, so I'm using the across funtion from dplyr. Here's an exemple:


> df <- tibble(Q1_cat5 = as.factor(c("1_Disagree","2_SomeD","2_SomeD","4_SomeA","5_Agree")),
                  Q2_cat5 = as.factor(c("5_Agree","5_Agree","3_Neither","4_SomeA","5_Agree")),
                  Q3_cat5 = as.factor(c("3_Neither","2_SomeD","2_SomeD","1_Disagree","5_Agree")))

> df
# A tibble: 5 × 3
  Q1_cat5    Q2_cat5   Q3_cat5   
  <fct>      <fct>     <fct>     
1 1_Disagree 5_Agree   3_Neither 
2 2_SomeD    5_Agree   2_SomeD   
3 2_SomeD    3_Neither 2_SomeD   
4 4_SomeA    4_SomeA   1_Disagree
5 5_Agree    5_Agree   5_Agree  

What I'm trying to obtain:

> df2
# A tibble: 5 × 6
  Q1_cat5    Q2_cat5   Q3_cat5    Q1_cat3    Q2_cat3   Q3_cat3   
  <fct>      <fct>     <fct>      <fct>      <fct>     <fct>     
1 1_Disagree 5_Agree   3_Neither  1_Disagree 3_Agree   2_Neither 
2 2_SomeD    5_Agree   2_SomeD    1_Disagree 3_Agree   1_Disagree
3 2_SomeD    3_Neither 2_SomeD    1_Disagree 2_Neither 1_Disagree
4 4_SomeA    4_SomeA   1_Disagree 3_Agree    3_Agree   1_Disagree
5 5_Agree    5_Agree   5_Agree    3_Agree    3_Agree   3_Agree  

As you can see, the new variables work as follow:

  • If Q1_cat5 = "1_Disagree" or "2_SomeD" then Q1_cat3 = "1_Disagree"
  • If Q1_cat5 = "3_Neither" then Q1_cat3 = "2_Neither"
  • If Q1_cat5 = "4_SomeA" or "5_Agree" then Q1_cat3 = "3_Agree"

I've tried the following code:

df2 <- df %>% mutate(across(.cols = starts_with('Q') & ends_with('cat5'),
                                 .funs = case_when(                                
                                    (. == "1_Disagree" | . == "2_SomeD") ~ '1_Disagree',
                                    . == "3_Neither" ~ '2_Neither',
                                    (. == "4_SomeA" |. == "5_Agree") ~ '3_Agree',
                                    is.na(.) ~ NA,
                                    ),
                                 .names = '{str_sub(.col,1,-5)}cat3'
                                 )
                        )

Which indeed creates new variables Q1_cat3, Q2_cat3, etc... But it keeps the old values of Q1_cat5, Q2_cat5, etc... So instead of what I want, it duplicates the old variables and just rename them:

> df2
# A tibble: 5 × 6
  Q1_cat5    Q2_cat5   Q3_cat5    Q1_cat3    Q2_cat3   Q3_cat3   
  <fct>      <fct>     <fct>      <fct>      <fct>     <fct>     
1 1_Disagree 5_Agree   3_Neither  1_Disagree 5_Agree   3_Neither 
2 2_SomeD    5_Agree   2_SomeD    2_SomeD    5_Agree   2_SomeD
3 2_SomeD    3_Neither 2_SomeD    2_SomeD    3_Neither 2_SomeD
4 4_SomeA    4_SomeA   1_Disagree 4_SomeA    4_SomeA   1_Disagree
5 5_Agree    5_Agree   5_Agree    5_Agree    5_Agree   5_Agree  

Even after doing a lot of research and trying several other solutions, I can't figure out why this isn't working, nor can I find another solution to effectively do what I want. I've other post about "case_when" with "across" but none of the solutions work for me. Could you help me?

Firstly, across has an argument .fns not .funs . However, the main issue is that you're trying to pass a lambda function without using the necessary operator such as tilde ( ~ ) in tidyverse . Try with:

df2 <- df %>% 
  mutate(
    across(.cols = starts_with('Q') & ends_with('cat5'),
           ~ case_when(
             (. == "1_Disagree" | . == "2_SomeD") ~ '1_Disagree',
             . == "3_Neither" ~ '2_Neither',
             (. == "4_SomeA" |. == "5_Agree") ~ '3_Agree',
             is.na(.) ~ NA_character_ # You can skip this part though
             ),
           .names = '{str_sub(.col,1,-5)}cat3')
    )

Output:

df2

# A tibble: 5 x 6
  Q1_cat5    Q2_cat5   Q3_cat5    Q1_cat3    Q2_cat3   Q3_cat3   
  <fct>      <fct>     <fct>      <chr>      <chr>     <chr>     
1 1_Disagree 5_Agree   3_Neither  1_Disagree 3_Agree   2_Neither 
2 2_SomeD    5_Agree   2_SomeD    1_Disagree 3_Agree   1_Disagree
3 2_SomeD    3_Neither 2_SomeD    1_Disagree 2_Neither 1_Disagree
4 4_SomeA    4_SomeA   1_Disagree 3_Agree    3_Agree   1_Disagree
5 5_Agree    5_Agree   5_Agree    3_Agree    3_Agree   3_Agree   

As you can see, instead of only NA you'll also need to specify NA_character_ as all values need to be of same type, including NA . I am not sure about your use case though, normally you could skip the last step as anything not fitting the previously described rules will be NA anyhow.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM