簡體   English   中英

在 dplyr 1.0.0 中使用 mutate() 和 cross() 從多個變量創建新變量

[英]creating new variables from multiple variable using mutate() and across() in dplyr 1.0.0

我需要以相同的方式將具有相同前綴的多個列全部變異為新列。

這是玩具數據

df <- data.frame(su_1 = round(rnorm(12),2),
                 su_2 = round(rnorm(12),2),
                 su_3 = round(rnorm(12),2))

現在說我想將每個變量的連續值排序到離散的 bin 中。 我可以像這樣對每一列使用三個獨立的類似步驟

df %>% mutate(su_1_disc = ifelse(su_1 < 0, "less", 
                                 ifelse(su_1 > 0 & su_1 <= 0.5, "mid", "lots"))) -> df

df %>% mutate(su_2_disc = ifelse(su_2 < 0, "less", 
                                 ifelse(su_2 > 0 & su_2 <= 0.5, "mid", "lots"))) -> df

df %>% mutate(su_3_disc = ifelse(su_3 < 0, "less", 
                                 ifelse(su_3 > 0 & su_3 <= 0.5, "mid", "lots"))) -> df

df

# output
#     su_1  su_2  su_3 su_1_disc su_2_disc su_3_disc
# 1   1.99  0.77 -0.17      lots      lots      less
# 2   0.51 -0.76 -1.24      lots      less      less
# 3   1.50 -0.36  0.28      lots      less       mid
# 4   0.86  0.88 -0.52      lots      lots      less
# 5   0.08  0.63 -0.76       mid      lots      less
# 6  -0.51 -0.99  0.01      less      less       mid
# 7   0.35  1.59  0.19       mid      lots       mid
# 8   0.16  0.35  0.38       mid       mid       mid
# 9  -0.75 -0.45  1.75      less      less      lots
# 10  0.97  0.62 -0.05      lots      lots      less
# 11 -0.07  0.47 -0.24      less       mid      less
# 12  0.61 -0.27 -1.55      lots      less      less

但我想使用新的 dplyr 1.0.0 功能一步完成

我試過了

df %>%
  mutate(across(starts_with("su_"),
                ifelse(.x < 0, "less", 
                       ifelse(.x > 0 & .x <= 0.5, "mid", "lots"))))

但它拋出了一個錯誤。 我知道.names需要在某個地方加入,但我有點迷茫。

您可以使用 -

library(dplyr)

df %>%
  mutate(across(starts_with("su_"),~ifelse(.x < 0, "less", 
         ifelse(.x > 0 & .x <= 0.5, "mid", "lots")), .names = '{col}_disc'))

#    su_1  su_2  su_3 su_1_disc su_2_disc su_3_disc
#1   0.40  0.57 -0.11       mid      lots      less
#2   1.82 -0.55  0.44      lots      less       mid
#3   0.44  1.47 -0.39       mid      lots      less
#4  -0.82  0.00 -0.12      less      lots      less
#5   0.17 -0.10 -1.55       mid      less      less
#6   0.20  0.98 -1.02       mid      lots      less
#7  -0.01  1.12 -0.30      less      lots      less
#8  -0.70  0.31  0.35      less       mid       mid
#9   0.46  1.18 -0.22       mid      lots      less
#10 -1.09  0.03 -0.85      less       mid      less
#11 -0.03  1.81  1.28      less      lots      lots
#12 -0.11  1.64 -0.51      less      lots      less

您還可以將ifelse替換為case_whencut

考慮使用case_when而不是嵌套的ifelse

library(dplyr)
df %>%
    mutate(across(starts_with("su_"), ~ case_when(. < 0 ~ "less",
              between(., 0, 0.5) ~ "mid", TRUE  ~ "lots"), 
        .names = "{.col}_disc"))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM