是否可以使用 mutate()、cross()、starts_with() 和 case_when() 同時創建許多新變量？

Question

我有一個寬格式縱向數據集，其中包含一組變量，這些變量指示參與者在研究期間的每一年所居住的 state。 如果當年沒有參與者居住在給定的 state 中，則變量中沒有該 state 的級別。 例如，使用數據集僅包含來自新英格蘭州（MA、CT、RI、VT、NH、ME）的參與者的簡化版本：

ID <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
state_2000 <- c("MA", "MA", "RI", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
state_2002 <- c("MA", "MA", "RI", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
# participant # 3 moves from RI to MA; RI no longer a level in the subsequent state variables
state_2004 <- c("MA", "MA", "MA", "VT", "NH", "NH", "ME", "CT", "CT", "ME")
state_2006 <- c("MA", "MA", "MA", "VT", "NH", "NH", "ME", "CT", "CT", "ME")

df <- data.frame(ID, state_2000, state_2002, state_2004, state_2006)

print (df)
   ID state_2000 state_2002 state_2004 state_2006
1   1         MA         MA         MA         MA
2   2         MA         MA         MA         MA
3   3         RI         RI         MA         MA
4   4         VT         VT         VT         VT
5   5         NH         NH         NH         NH
6   6         NH         NH         NH         NH
7   7         ME         ME         ME         ME
8   8         CT         CT         CT         CT
9   9         CT         CT         CT         CT
10 10         ME         ME         ME         ME

table(df$state_2002, useNA = "always")
  CT   MA   ME   NH   RI   VT <NA> 
   2    2    2    2    1    1    0 

table(df$state_2004, useNA = "always")
  CT   MA   ME   NH   VT <NA> 
   2    3    2    2    1    0

我想創建一組新的 state 變量，這些變量具有每個 state 的類別（該年沒有人居住的州的類別將丟失），也許通過使用某種組合 mutate()、cross()、starts_with() , 和 case_when()。 我試過類似的東西：

df <- 
  df %>%
  mutate(across(starts_with("state_"), as.factor(case_when( 
    starts_with("state_")=="CT" ~ 1,
    starts_with("state_")=="MA" ~ 2,
    starts_with("state_")=="ME" ~ 3,
    starts_with("state_")== "NH" ~ 4,
    starts_with("state_")=="RI" ~ 5,
    starts_with("state_")=="VT" ~ 6,
    TRUE ~ NA_real_))))

但是，這似乎不起作用，因為我收到如下錯誤：

Error in `mutate()`:
! Problem while computing `..1 = across(...)`.
Caused by error:
! attempt to select less than one element in integerOneIndex

有誰知道如何做到這一點？ 太感謝了！

Answer 1

使用~和. 創建一個 function

df <- df %>%
  mutate(across(starts_with("state_"), ~ as.factor(case_when( 
    . =="CT" ~ 1,
    . =="MA" ~ 2,
    . =="ME" ~ 3,
    . == "NH" ~ 4,
    . =="RI" ~ 5,
    . =="VT" ~ 6,
    TRUE ~ NA_real_)), 
    .names = "{.col}_num")) # Remove this argument if you don't want to retain the initial columns

df
   ID state_2000 state_2002 state_2004 state_2006 state_2000_num state_2002_num state_2004_num
1   1         MA         MA         MA         MA              2              2              2
2   2         MA         MA         MA         MA              2              2              2
3   3         RI         RI         MA         MA              5              5              2
4   4         VT         VT         VT         VT              6              6              6
5   5         NH         NH         NH         NH              4              4              4
6   6         NH         NH         NH         NH              4              4              4
7   7         ME         ME         ME         ME              3              3              3
8   8         CT         CT         CT         CT              1              1              1
9   9         CT         CT         CT         CT              1              1              1
10 10         ME         ME         ME         ME              3              3              3
   state_2006_num
1               2
2               2
3               2
4               6
5               4
6               4
7               3
8               1
9               1
10              3

Answer 2

你不需要case_when在這里。

在您的數據集中創建一個unique lev的向量，然后您可以使用 cross 創建across列。 將您的列轉換為具有相同as.numeric的factor ，然后使用lev將它們轉換為數字：它們都將共享相同的編號。

lev = unique(unlist(df[-1]))
df %>% 
  mutate(across(starts_with("state_"), ~ as.numeric(factor(.x, levels = lev)),
         .names = "{col}_new"))

   ID state_2000 state_2002 state_2004 state_2006 state_2000_new state_2002_new state_2004_new state_2006_new
1   1         MA         MA         MA         MA              1              1              1              1
2   2         MA         MA         MA         MA              1              1              1              1
3   3         RI         RI         MA         MA              2              2              1              1
4   4         VT         VT         VT         VT              3              3              3              3
5   5         NH         NH         NH         NH              4              4              4              4
6   6         NH         NH         NH         NH              4              4              4              4
7   7         ME         ME         ME         ME              5              5              5              5
8   8         CT         CT         CT         CT              6              6              6              6
9   9         CT         CT         CT         CT              6              6              6              6
10 10         ME         ME         ME         ME              5              5              5              5

是否可以使用 mutate()、cross()、starts_with() 和 case_when() 同時創建許多新變量？

問題描述

2 個解決方案

解決方案1
1 2022-09-06 11:52:02

解決方案2
1 2022-09-06 11:52:27

是否可以使用 mutate()、cross()、starts_with() 和 case_when() 同時創建許多新變量？

問題描述

2 個解決方案

解決方案1 1 2022-09-06 11:52:02

解決方案2 1 2022-09-06 11:52:27

解決方案1
1 2022-09-06 11:52:02

解決方案2
1 2022-09-06 11:52:27