简体   繁体   中英

Is it possible to simultaneously create many new variables using mutate(), across(), starts_with(), and case_when()?

I have a wide-format longitudinal dataset with a set of variables that indicate what state a participant lived in for each year of the study period. If no participant lived in a given state that year, there is no level for that state in the variable. For example, using a simplified version in which the dataset contains participants from New England states (MA, CT, RI, VT, NH, ME) only:

ID <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
state_2000 <- c("MA", "MA", "RI", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
state_2002 <- c("MA", "MA", "RI", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
# participant # 3 moves from RI to MA; RI no longer a level in the subsequent state variables
state_2004 <- c("MA", "MA", "MA", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
state_2006 <- c("MA", "MA", "MA", "VT", "NH", "NH", "ME", "CT", "CT", "ME")

df <- data.frame(ID, state_2000, state_2002, state_2004, state_2006)

print (df)
   ID state_2000 state_2002 state_2004 state_2006
1   1         MA         MA         MA         MA
2   2         MA         MA         MA         MA
3   3         RI         RI         MA         MA
4   4         VT         VT         VT         VT
5   5         NH         NH         NH         NH
6   6         NH         NH         NH         NH
7   7         ME         ME         ME         ME
8   8         CT         CT         CT         CT
9   9         CT         CT         CT         CT
10 10         ME         ME         ME         ME

table(df$state_2002, useNA = "always")
  CT   MA   ME   NH   RI   VT <NA> 
   2    2    2    2    1    1    0 

table(df$state_2004, useNA = "always")
  CT   MA   ME   NH   VT <NA> 
   2    3    2    2    1    0 

I want to create a set of new state variables that have categories for each state (where categories for states where no one lives in that year would be missing), perhaps by using some combination of mutate(), across(), starts_with(), and case_when(). I've tried something like:

df <- 
  df %>%
  mutate(across(starts_with("state_"), as.factor(case_when( 
    starts_with("state_")=="CT" ~ 1,
    starts_with("state_")=="MA" ~ 2,
    starts_with("state_")=="ME" ~ 3,
    starts_with("state_")== "NH" ~ 4,
    starts_with("state_")=="RI" ~ 5,
    starts_with("state_")=="VT" ~ 6,
    TRUE ~ NA_real_)))) 

However, this doesn't seem to work, as I get errors like:

Error in `mutate()`:
! Problem while computing `..1 = across(...)`.
Caused by error:
! attempt to select less than one element in integerOneIndex

Does anyone know how to do this? Thank you so much!

Use ~ and . to create a function

df <- df %>%
  mutate(across(starts_with("state_"), ~ as.factor(case_when( 
    . =="CT" ~ 1,
    . =="MA" ~ 2,
    . =="ME" ~ 3,
    . == "NH" ~ 4,
    . =="RI" ~ 5,
    . =="VT" ~ 6,
    TRUE ~ NA_real_)), 
    .names = "{.col}_num")) # Remove this argument if you don't want to retain the initial columns

df
   ID state_2000 state_2002 state_2004 state_2006 state_2000_num state_2002_num state_2004_num
1   1         MA         MA         MA         MA              2              2              2
2   2         MA         MA         MA         MA              2              2              2
3   3         RI         RI         MA         MA              5              5              2
4   4         VT         VT         VT         VT              6              6              6
5   5         NH         NH         NH         NH              4              4              4
6   6         NH         NH         NH         NH              4              4              4
7   7         ME         ME         ME         ME              3              3              3
8   8         CT         CT         CT         CT              1              1              1
9   9         CT         CT         CT         CT              1              1              1
10 10         ME         ME         ME         ME              3              3              3
   state_2006_num
1               2
2               2
3               2
4               6
5               4
6               4
7               3
8               1
9               1
10              3

You don't need case_when here.

Create a vector of unique lev els in your dataset, and then you can create multiple columns with across . Convert your columns to factor with the same lev els and then use as.numeric to convert them to numbers: they will all share the same numbering.

lev = unique(unlist(df[-1]))
df %>% 
  mutate(across(starts_with("state_"), ~ as.numeric(factor(.x, levels = lev)),
         .names = "{col}_new"))
   ID state_2000 state_2002 state_2004 state_2006 state_2000_new state_2002_new state_2004_new state_2006_new
1   1         MA         MA         MA         MA              1              1              1              1
2   2         MA         MA         MA         MA              1              1              1              1
3   3         RI         RI         MA         MA              2              2              1              1
4   4         VT         VT         VT         VT              3              3              3              3
5   5         NH         NH         NH         NH              4              4              4              4
6   6         NH         NH         NH         NH              4              4              4              4
7   7         ME         ME         ME         ME              5              5              5              5
8   8         CT         CT         CT         CT              6              6              6              6
9   9         CT         CT         CT         CT              6              6              6              6
10 10         ME         ME         ME         ME              5              5              5              5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM