[英]creating new variables from multiple variable using mutate() and across() in dplyr 1.0.0
I need to mutate multiple columns all with the same prefix all in the same way into new columns.我需要以相同的方式将具有相同前缀的多个列全部变异为新列。
Here is the toy data这是玩具数据
df <- data.frame(su_1 = round(rnorm(12),2),
su_2 = round(rnorm(12),2),
su_3 = round(rnorm(12),2))
Now say I want to sort the continuous values from each variable into discrete bins.现在说我想将每个变量的连续值排序到离散的 bin 中。 I can do it using three separate, analogous steps for each column like so
我可以像这样对每一列使用三个独立的类似步骤
df %>% mutate(su_1_disc = ifelse(su_1 < 0, "less",
ifelse(su_1 > 0 & su_1 <= 0.5, "mid", "lots"))) -> df
df %>% mutate(su_2_disc = ifelse(su_2 < 0, "less",
ifelse(su_2 > 0 & su_2 <= 0.5, "mid", "lots"))) -> df
df %>% mutate(su_3_disc = ifelse(su_3 < 0, "less",
ifelse(su_3 > 0 & su_3 <= 0.5, "mid", "lots"))) -> df
df
# output
# su_1 su_2 su_3 su_1_disc su_2_disc su_3_disc
# 1 1.99 0.77 -0.17 lots lots less
# 2 0.51 -0.76 -1.24 lots less less
# 3 1.50 -0.36 0.28 lots less mid
# 4 0.86 0.88 -0.52 lots lots less
# 5 0.08 0.63 -0.76 mid lots less
# 6 -0.51 -0.99 0.01 less less mid
# 7 0.35 1.59 0.19 mid lots mid
# 8 0.16 0.35 0.38 mid mid mid
# 9 -0.75 -0.45 1.75 less less lots
# 10 0.97 0.62 -0.05 lots lots less
# 11 -0.07 0.47 -0.24 less mid less
# 12 0.61 -0.27 -1.55 lots less less
But I would like to do it in a single step using the new dplyr 1.0.0 functionality但我想使用新的 dplyr 1.0.0 功能一步完成
I tried我试过了
df %>%
mutate(across(starts_with("su_"),
ifelse(.x < 0, "less",
ifelse(.x > 0 & .x <= 0.5, "mid", "lots"))))
But it threw an error.但它抛出了一个错误。 I know
.names
needs to come into it somewhere but I'm a bit lost.我知道
.names
需要在某个地方加入,但我有点迷茫。
You can use -您可以使用 -
library(dplyr)
df %>%
mutate(across(starts_with("su_"),~ifelse(.x < 0, "less",
ifelse(.x > 0 & .x <= 0.5, "mid", "lots")), .names = '{col}_disc'))
# su_1 su_2 su_3 su_1_disc su_2_disc su_3_disc
#1 0.40 0.57 -0.11 mid lots less
#2 1.82 -0.55 0.44 lots less mid
#3 0.44 1.47 -0.39 mid lots less
#4 -0.82 0.00 -0.12 less lots less
#5 0.17 -0.10 -1.55 mid less less
#6 0.20 0.98 -1.02 mid lots less
#7 -0.01 1.12 -0.30 less lots less
#8 -0.70 0.31 0.35 less mid mid
#9 0.46 1.18 -0.22 mid lots less
#10 -1.09 0.03 -0.85 less mid less
#11 -0.03 1.81 1.28 less lots lots
#12 -0.11 1.64 -0.51 less lots less
You can also replace ifelse
with case_when
or cut
.您还可以将
ifelse
替换为case_when
或cut
。
Consider using case_when
instead of nested ifelse
考虑使用
case_when
而不是嵌套的ifelse
library(dplyr)
df %>%
mutate(across(starts_with("su_"), ~ case_when(. < 0 ~ "less",
between(., 0, 0.5) ~ "mid", TRUE ~ "lots"),
.names = "{.col}_disc"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.