简体   繁体   English

是否可以使用 mutate()、cross()、starts_with() 和 case_when() 同时创建许多新变量?

[英]Is it possible to simultaneously create many new variables using mutate(), across(), starts_with(), and case_when()?

I have a wide-format longitudinal dataset with a set of variables that indicate what state a participant lived in for each year of the study period.我有一个宽格式纵向数据集,其中包含一组变量,这些变量指示参与者在研究期间的每一年所居住的 state。 If no participant lived in a given state that year, there is no level for that state in the variable.如果当年没有参与者居住在给定的 state 中,则变量中没有该 state 的级别。 For example, using a simplified version in which the dataset contains participants from New England states (MA, CT, RI, VT, NH, ME) only:例如,使用数据集仅包含来自新英格兰州(MA、CT、RI、VT、NH、ME)的参与者的简化版本:

ID <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
state_2000 <- c("MA", "MA", "RI", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
state_2002 <- c("MA", "MA", "RI", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
# participant # 3 moves from RI to MA; RI no longer a level in the subsequent state variables
state_2004 <- c("MA", "MA", "MA", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
state_2006 <- c("MA", "MA", "MA", "VT", "NH", "NH", "ME", "CT", "CT", "ME")

df <- data.frame(ID, state_2000, state_2002, state_2004, state_2006)

print (df)
   ID state_2000 state_2002 state_2004 state_2006
1   1         MA         MA         MA         MA
2   2         MA         MA         MA         MA
3   3         RI         RI         MA         MA
4   4         VT         VT         VT         VT
5   5         NH         NH         NH         NH
6   6         NH         NH         NH         NH
7   7         ME         ME         ME         ME
8   8         CT         CT         CT         CT
9   9         CT         CT         CT         CT
10 10         ME         ME         ME         ME

table(df$state_2002, useNA = "always")
  CT   MA   ME   NH   RI   VT <NA> 
   2    2    2    2    1    1    0 

table(df$state_2004, useNA = "always")
  CT   MA   ME   NH   VT <NA> 
   2    3    2    2    1    0 

I want to create a set of new state variables that have categories for each state (where categories for states where no one lives in that year would be missing), perhaps by using some combination of mutate(), across(), starts_with(), and case_when().我想创建一组新的 state 变量,这些变量具有每个 state 的类别(该年没有人居住的州的类别将丢失),也许通过使用某种组合 mutate()、cross()、starts_with() , 和 case_when()。 I've tried something like:我试过类似的东西:

df <- 
  df %>%
  mutate(across(starts_with("state_"), as.factor(case_when( 
    starts_with("state_")=="CT" ~ 1,
    starts_with("state_")=="MA" ~ 2,
    starts_with("state_")=="ME" ~ 3,
    starts_with("state_")== "NH" ~ 4,
    starts_with("state_")=="RI" ~ 5,
    starts_with("state_")=="VT" ~ 6,
    TRUE ~ NA_real_)))) 

However, this doesn't seem to work, as I get errors like:但是,这似乎不起作用,因为我收到如下错误:

Error in `mutate()`:
! Problem while computing `..1 = across(...)`.
Caused by error:
! attempt to select less than one element in integerOneIndex

Does anyone know how to do this?有谁知道如何做到这一点? Thank you so much!太感谢了!

Use ~ and .使用~. to create a function创建一个 function

df <- df %>%
  mutate(across(starts_with("state_"), ~ as.factor(case_when( 
    . =="CT" ~ 1,
    . =="MA" ~ 2,
    . =="ME" ~ 3,
    . == "NH" ~ 4,
    . =="RI" ~ 5,
    . =="VT" ~ 6,
    TRUE ~ NA_real_)), 
    .names = "{.col}_num")) # Remove this argument if you don't want to retain the initial columns

df
   ID state_2000 state_2002 state_2004 state_2006 state_2000_num state_2002_num state_2004_num
1   1         MA         MA         MA         MA              2              2              2
2   2         MA         MA         MA         MA              2              2              2
3   3         RI         RI         MA         MA              5              5              2
4   4         VT         VT         VT         VT              6              6              6
5   5         NH         NH         NH         NH              4              4              4
6   6         NH         NH         NH         NH              4              4              4
7   7         ME         ME         ME         ME              3              3              3
8   8         CT         CT         CT         CT              1              1              1
9   9         CT         CT         CT         CT              1              1              1
10 10         ME         ME         ME         ME              3              3              3
   state_2006_num
1               2
2               2
3               2
4               6
5               4
6               4
7               3
8               1
9               1
10              3

You don't need case_when here.你不需要case_when在这里。

Create a vector of unique lev els in your dataset, and then you can create multiple columns with across .在您的数据集中创建一个unique lev的向量,然后您可以使用 cross 创建across列。 Convert your columns to factor with the same lev els and then use as.numeric to convert them to numbers: they will all share the same numbering.将您的列转换为具有相同as.numericfactor ,然后使用lev将它们转换为数字:它们都将共享相同的编号。

lev = unique(unlist(df[-1]))
df %>% 
  mutate(across(starts_with("state_"), ~ as.numeric(factor(.x, levels = lev)),
         .names = "{col}_new"))
   ID state_2000 state_2002 state_2004 state_2006 state_2000_new state_2002_new state_2004_new state_2006_new
1   1         MA         MA         MA         MA              1              1              1              1
2   2         MA         MA         MA         MA              1              1              1              1
3   3         RI         RI         MA         MA              2              2              1              1
4   4         VT         VT         VT         VT              3              3              3              3
5   5         NH         NH         NH         NH              4              4              4              4
6   6         NH         NH         NH         NH              4              4              4              4
7   7         ME         ME         ME         ME              5              5              5              5
8   8         CT         CT         CT         CT              6              6              6              6
9   9         CT         CT         CT         CT              6              6              6              6
10 10         ME         ME         ME         ME              5              5              5              5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM