简体   繁体   中英

Convert multiple dummy/logical variables into a single categorical variable in R dplyr

I have a question similar to this one . I want to convert various dummy/logical variables into a single categorical variable/factor based on their name in R. My question is different because there can be many groupings of variables that need to be encoded. For example age and chol_test in this example. This is just a subset of my data frame. There are additional variables such as diabetes_test , etc that would also need to be converted, so I can't just do starts_with("condition") .

I want to encode the lows to be 1, mediums to be 2, and highs to be 3. If all the encoded variables are 0, leave as N/A.

list(low = 1, medium = 2, high = 3)

Basically the data looks like so:

Input

  race  gender age.low_tm1 age.medium_tm1 age.high_tm1 chol_test.low_tm1 chol_test.high_tm1
  <chr>  <int>       <int>          <int>        <int>             <int>              <int>
1 white      0           1              0            0                 0                  0
2 white      0           1              0            0                 0                  0
3 white      1           1              0            0                 0                  0
4 black      1           0              1            0                 0                  0
5 white      0           0              0            1                 0                  1
6 black      0           0              1            0                 1                  0

I want the output to look like so:

Expected Output:

  race  gender   age  chol_test
1 white      0     1        n/a  
2 white      0     1        n/a
3 white      1     1        n/a
4 black      1     2        n/a
5 white      0     3          3
6 black      0     2          1

How could I do this? I'm looking for a solution that is similar to the ones posted in the question I linked using dplyr if possible. Sorry for any redundancies.

Data

df <- structure(list(race = c("white", "white", "white", "black", "white", 
"black"), gender = c(0L, 0L, 1L, 1L, 0L, 0L), age.low_tm1 = c(1L, 
1L, 1L, 0L, 0L, 0L), age.medium_tm1 = c(0L, 0L, 0L, 1L, 0L, 1L
), age.high_tm1 = c(0L, 0L, 0L, 0L, 1L, 0L), chol_test.low_tm1 = c(0L, 
0L, 0L, 0L, 0L, 1L), chol_test.high_tm1 = c(0L, 0L, 0L, 0L, 1L, 
0L)), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6"))

This is how I would do it

df %>% 
  mutate(id = row_number()) %>%
  pivot_longer(cols = -c(race, gender, id)) %>%
  filter(value > 0) %>%
  separate(name, c("var", "range1"), sep = '\\.') %>%
  mutate(
    value = case_when(
      range1 == 'low_tm1' ~ 1, 
      range1 == 'medium_tm1' ~ 2, 
      range1 == 'high_tm1' ~ 3, 
    )
  ) %>%
  select(-range1) %>%
  pivot_wider(names_from = var, values_from = value) %>%
  select(-id)

  race  gender   age chol_test
  <chr>  <int> <dbl>     <dbl>
1 white      0     1        NA
2 white      0     1        NA
3 white      1     1        NA
4 black      1     2        NA
5 white      0     3         3
6 black      0     2         1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM