简体   繁体   English

在 dplyr 1.0.0 中使用 mutate() 和 cross() 从多个变量创建新变量

[英]creating new variables from multiple variable using mutate() and across() in dplyr 1.0.0

I need to mutate multiple columns all with the same prefix all in the same way into new columns.我需要以相同的方式将具有相同前缀的多个列全部变异为新列。

Here is the toy data这是玩具数据

df <- data.frame(su_1 = round(rnorm(12),2),
                 su_2 = round(rnorm(12),2),
                 su_3 = round(rnorm(12),2))

Now say I want to sort the continuous values from each variable into discrete bins.现在说我想将每个变量的连续值排序到离散的 bin 中。 I can do it using three separate, analogous steps for each column like so我可以像这样对每一列使用三个独立的类似步骤

df %>% mutate(su_1_disc = ifelse(su_1 < 0, "less", 
                                 ifelse(su_1 > 0 & su_1 <= 0.5, "mid", "lots"))) -> df

df %>% mutate(su_2_disc = ifelse(su_2 < 0, "less", 
                                 ifelse(su_2 > 0 & su_2 <= 0.5, "mid", "lots"))) -> df

df %>% mutate(su_3_disc = ifelse(su_3 < 0, "less", 
                                 ifelse(su_3 > 0 & su_3 <= 0.5, "mid", "lots"))) -> df

df

# output
#     su_1  su_2  su_3 su_1_disc su_2_disc su_3_disc
# 1   1.99  0.77 -0.17      lots      lots      less
# 2   0.51 -0.76 -1.24      lots      less      less
# 3   1.50 -0.36  0.28      lots      less       mid
# 4   0.86  0.88 -0.52      lots      lots      less
# 5   0.08  0.63 -0.76       mid      lots      less
# 6  -0.51 -0.99  0.01      less      less       mid
# 7   0.35  1.59  0.19       mid      lots       mid
# 8   0.16  0.35  0.38       mid       mid       mid
# 9  -0.75 -0.45  1.75      less      less      lots
# 10  0.97  0.62 -0.05      lots      lots      less
# 11 -0.07  0.47 -0.24      less       mid      less
# 12  0.61 -0.27 -1.55      lots      less      less

But I would like to do it in a single step using the new dplyr 1.0.0 functionality但我想使用新的 dplyr 1.0.0 功能一步完成

I tried我试过了

df %>%
  mutate(across(starts_with("su_"),
                ifelse(.x < 0, "less", 
                       ifelse(.x > 0 & .x <= 0.5, "mid", "lots"))))

But it threw an error.但它抛出了一个错误。 I know .names needs to come into it somewhere but I'm a bit lost.我知道.names需要在某个地方加入,但我有点迷茫。

You can use -您可以使用 -

library(dplyr)

df %>%
  mutate(across(starts_with("su_"),~ifelse(.x < 0, "less", 
         ifelse(.x > 0 & .x <= 0.5, "mid", "lots")), .names = '{col}_disc'))

#    su_1  su_2  su_3 su_1_disc su_2_disc su_3_disc
#1   0.40  0.57 -0.11       mid      lots      less
#2   1.82 -0.55  0.44      lots      less       mid
#3   0.44  1.47 -0.39       mid      lots      less
#4  -0.82  0.00 -0.12      less      lots      less
#5   0.17 -0.10 -1.55       mid      less      less
#6   0.20  0.98 -1.02       mid      lots      less
#7  -0.01  1.12 -0.30      less      lots      less
#8  -0.70  0.31  0.35      less       mid       mid
#9   0.46  1.18 -0.22       mid      lots      less
#10 -1.09  0.03 -0.85      less       mid      less
#11 -0.03  1.81  1.28      less      lots      lots
#12 -0.11  1.64 -0.51      less      lots      less

You can also replace ifelse with case_when or cut .您还可以将ifelse替换为case_whencut

Consider using case_when instead of nested ifelse考虑使用case_when而不是嵌套的ifelse

library(dplyr)
df %>%
    mutate(across(starts_with("su_"), ~ case_when(. < 0 ~ "less",
              between(., 0, 0.5) ~ "mid", TRUE  ~ "lots"), 
        .names = "{.col}_disc"))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R dplyr 变异:使用“或”逻辑从多个列变量创建一个变量 - R dplyr mutate: creating one variable from multiple column variables using "or" logic dplyr:将顺序函数应用于变量,而无需在单个 mutate(across(...)) 中创建新变量 - dplyr: apply sequential functions to variables without creating new variables in a single mutate(across(...)) 使用 dplyr 跨多个列进行变异 - Mutate across multiple columns using dplyr 在 R 中使用 mutate() 和 across() 创建多个新列 - Creating multiple new columns using mutate() and across() in R 使用 dplyr 和 mutate 对 R 中的新变量进行分类 - Using dplyr and mutate to categorize a new variable in R 使用 dplyr::mutate 根据字符串向量(或 tidyselect)传递的多个条件和相应的变量名称创建新变量 - Creating new variable with dplyr::mutate based on multiple conditions and corresponding variable names passed by string vector (or tidyselect) 在许多变量的逻辑条件下使用mutate创建新变量 - mutate? - Create new variable using mutate on logical conditions across many variables - mutate? dplyr mutate:传递变量列表以创建多个新变量 - dplyr mutate: pass list of variables to create multiple new variables 使用mutate创建新变量时,Dplyr代码比预期慢 - Dplyr code is slower than expected when creating new variables with mutate 使用 dplyr::mutate() 创建新变量而不会产生名称冲突 - Creating new variables with dplyr::mutate() without conflicting names
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM