简体   繁体   English

变异多个变量以创建多个新变量

[英]Mutate multiple variable to create multiple new variables

Let's say I have a tibble where I need to take multiple variables and mutate them into new multiple new variables.假设我有一个tibble ,我需要获取多个变量并将它们变异为新的多个新变量。

As an example, here is a simple tibble:例如,这是一个简单的 tibble:

tb <- tribble(
  ~x, ~y1, ~y2, ~y3, ~z,
  1,2,4,6,2,
  2,1,2,3,3,
  3,6,4,2,1
)

I want to subtract variable z from every variable with a name starting with "y", and mutate the results as new variables of tb.我想从名称以“y”开头的每个变量中减去变量 z,并将结果变异为 tb 的新变量。 Also, suppose I don't know how many "y" variables I have.另外,假设我不知道我有多少“y”变量。 I want the solution to fit nicely within tidyverse / dplyr workflow.我希望该解决方案非常适合tidyverse / dplyr工作流程。

In essence, I don't understand how to mutate multiple variables into multiple new variables.本质上,我不明白如何将多个变量变异为多个新变量。 I'm not sure if you can use mutate in this instance?我不确定您是否可以在这种情况下使用mutate I've tried mutate_if , but I don't think I'm using it right (and I get an error):我试过mutate_if ,但我认为我没有正确使用它(并且出现错误):

tb %>% mutate_if(starts_with("y"), funs(.-z))

#Error: No tidyselect variables were registered

Thanks in advance!提前致谢!

Because you are operating on column names, you need to use mutate_at rather than mutate_if which uses the values within columns因为您正在对列名进行操作,所以您需要使用mutate_at而不是mutate_if ,后者使用列中的值

tb %>% mutate_at(vars(starts_with("y")), funs(. - z))
#> # A tibble: 3 x 5
#>       x    y1    y2    y3     z
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     2     4     2
#> 2     2    -2    -1     0     3
#> 3     3     5     3     1     1

To create new columns, instead of overwriting existing ones, we can give name to funs要创建新列,而不是覆盖现有列,我们可以给funs命名

# add suffix
tb %>% mutate_at(vars(starts_with("y")), funs(mod = . - z))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z y1_mod y2_mod y3_mod
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

# remove suffix, add prefix
tb %>%
  mutate_at(vars(starts_with("y")),  funs(mod = . - z)) %>%
  rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

Edit : In dplyr 0.8.0 or higher versions, funs() will be deprecated ( source1 & source2 ), need to use list() instead编辑:在dplyr 0.8.0或更高版本funs()将被弃用(来源1源2 ),需要使用list()代替

tb %>% mutate_at(vars(starts_with("y")), list(~ . - z))
#> # A tibble: 3 x 5
#>       x    y1    y2    y3     z
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     2     4     2
#> 2     2    -2    -1     0     3
#> 3     3     5     3     1     1

tb %>% mutate_at(vars(starts_with("y")), list(mod = ~ . - z))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z y1_mod y2_mod y3_mod
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

tb %>%
  mutate_at(vars(starts_with("y")),  list(mod = ~ . - z)) %>%
  rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

Edit 2 : dplyr 1.0.0+ has across() function which simplifies this task even further编辑 2dplyr 1.0.0+具有dplyr across()函数,可进一步简化此任务

Basic usage基本用法

across() has two primary arguments: across()有两个主要参数:

  • The first argument, .cols , selects the columns you want to operate on.第一个参数.cols选择要操作的列。 It uses tidy selection (like select() ) so you can pick variables by position, name, and type.它使用整洁的选择(如select() ),因此您可以按位置、名称和类型选择变量。
  • The second argument, .fns , is a function or list of functions to apply to each column.第二个参数.fns是要应用于每一列的函数或函数列表。 This can also be a purrr style formula (or list of formulas) like ~ .x / 2 .这也可以是 purrr 风格的公式(或公式列表),如~ .x / 2 (This argument is optional, and you can omit it if you just want to get the underlying data; you'll see that technique used in vignette("rowwise") .) (此参数是可选的,如果您只想获取基础数据,则可以省略它;您将看到vignette("rowwise") 。)
# Control how the names are created with the `.names` argument which 
# takes a [glue](http://glue.tidyverse.org/) spec:
tb %>% 
  mutate(
    across(starts_with("y"), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

tb %>% 
  mutate(
    across(num_range(prefix = "y", range = 1:3), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

### Multiple functions
tb %>% 
  mutate(
    across(c(matches("x"), contains("z")), ~ max(.x, na.rm = TRUE), .names = "max_{col}"),
    across(c(y1:y3), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 10
#>       x    y1    y2    y3     z max_x max_z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2     3     3      0      2      4
#> 2     2     1     2     3     3     3     3     -2     -1      0
#> 3     3     6     4     2     1     3     3      5      3      1

Created on 2018-10-29 by the reprex package (v0.2.1)reprex 包(v0.2.1) 于 2018 年 10 月 29 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 dplyr mutate:传递变量列表以创建多个新变量 - dplyr mutate: pass list of variables to create multiple new variables 使用 mutate 和 if/else if/else 语句创建多个新变量 - Using mutate with if/else if/else statements to create multiple new variables 跨多个列变异以创建新的变量集 - Mutate across multiple columns to create new variable sets 如何使用 mutate 通过正则表达式创建多个变量 - How to use mutate to create multiple variables with regex 在 dplyr 1.0.0 中使用 mutate() 和 cross() 从多个变量创建新变量 - creating new variables from multiple variable using mutate() and across() in dplyr 1.0.0 使用 mutate_at() 从单个变量的值创建多个二进制变量 - Use mutate_at() to create multiple binary variables from the values of a single variable 根据其他变量是否具有“是”,创建具有“是/否”的新变量。 变异? - Create new variable with "Yes/No" depending on if other variables have "Yes". Mutate? 使用 mutate 和 case_when (R) 通过多个条件创建新变量的函数 - Function to create new variable by multiple conditions using mutate and case_when (R) 在许多变量的逻辑条件下使用mutate创建新变量 - mutate? - Create new variable using mutate on logical conditions across many variables - mutate? 变异并创建多个动态变量,所有变量的值为 NA - mutate and create multiple dynamic variables all with value NA
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM