[英]Mutate multiple variable to create multiple new variables
Let's say I have a tibble
where I need to take multiple variables and mutate them into new multiple new variables.假设我有一个
tibble
,我需要获取多个变量并将它们变异为新的多个新变量。
As an example, here is a simple tibble:例如,这是一个简单的 tibble:
tb <- tribble(
~x, ~y1, ~y2, ~y3, ~z,
1,2,4,6,2,
2,1,2,3,3,
3,6,4,2,1
)
I want to subtract variable z from every variable with a name starting with "y", and mutate the results as new variables of tb.我想从名称以“y”开头的每个变量中减去变量 z,并将结果变异为 tb 的新变量。 Also, suppose I don't know how many "y" variables I have.
另外,假设我不知道我有多少“y”变量。 I want the solution to fit nicely within
tidyverse
/ dplyr
workflow.我希望该解决方案非常适合
tidyverse
/ dplyr
工作流程。
In essence, I don't understand how to mutate multiple variables into multiple new variables.本质上,我不明白如何将多个变量变异为多个新变量。 I'm not sure if you can use
mutate
in this instance?我不确定您是否可以在这种情况下使用
mutate
? I've tried mutate_if
, but I don't think I'm using it right (and I get an error):我试过
mutate_if
,但我认为我没有正确使用它(并且出现错误):
tb %>% mutate_if(starts_with("y"), funs(.-z))
#Error: No tidyselect variables were registered
Thanks in advance!提前致谢!
Because you are operating on column names, you need to use mutate_at
rather than mutate_if
which uses the values within columns因为您正在对列名进行操作,所以您需要使用
mutate_at
而不是mutate_if
,后者使用列中的值
tb %>% mutate_at(vars(starts_with("y")), funs(. - z))
#> # A tibble: 3 x 5
#> x y1 y2 y3 z
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 2 4 2
#> 2 2 -2 -1 0 3
#> 3 3 5 3 1 1
To create new columns, instead of overwriting existing ones, we can give name to funs
要创建新列,而不是覆盖现有列,我们可以给
funs
命名
# add suffix
tb %>% mutate_at(vars(starts_with("y")), funs(mod = . - z))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z y1_mod y2_mod y3_mod
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
# remove suffix, add prefix
tb %>%
mutate_at(vars(starts_with("y")), funs(mod = . - z)) %>%
rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
Edit : In dplyr 0.8.0
or higher versions, funs()
will be deprecated ( source1 & source2 ), need to use list()
instead编辑:在
dplyr 0.8.0
或更高版本funs()
将被弃用(来源1和源2 ),需要使用list()
代替
tb %>% mutate_at(vars(starts_with("y")), list(~ . - z))
#> # A tibble: 3 x 5
#> x y1 y2 y3 z
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 2 4 2
#> 2 2 -2 -1 0 3
#> 3 3 5 3 1 1
tb %>% mutate_at(vars(starts_with("y")), list(mod = ~ . - z))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z y1_mod y2_mod y3_mod
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
tb %>%
mutate_at(vars(starts_with("y")), list(mod = ~ . - z)) %>%
rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
Edit 2 : dplyr
1.0.0+ has across()
function which simplifies this task even further编辑 2 :
dplyr
1.0.0+具有dplyr
across()
函数,可进一步简化此任务
Basic usage
基本用法
across()
has two primary arguments:across()
有两个主要参数:
- The first argument,
.cols
, selects the columns you want to operate on.第一个参数
.cols
选择要操作的列。 It uses tidy selection (likeselect()
) so you can pick variables by position, name, and type.它使用整洁的选择(如
select()
),因此您可以按位置、名称和类型选择变量。
- The second argument,
.fns
, is a function or list of functions to apply to each column.第二个参数
.fns
是要应用于每一列的函数或函数列表。 This can also be a purrr style formula (or list of formulas) like~ .x / 2
.这也可以是 purrr 风格的公式(或公式列表),如
~ .x / 2
。 (This argument is optional, and you can omit it if you just want to get the underlying data; you'll see that technique used invignette("rowwise")
.)(此参数是可选的,如果您只想获取基础数据,则可以省略它;您将看到
vignette("rowwise")
。)
# Control how the names are created with the `.names` argument which
# takes a [glue](http://glue.tidyverse.org/) spec:
tb %>%
mutate(
across(starts_with("y"), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
tb %>%
mutate(
across(num_range(prefix = "y", range = 1:3), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
### Multiple functions
tb %>%
mutate(
across(c(matches("x"), contains("z")), ~ max(.x, na.rm = TRUE), .names = "max_{col}"),
across(c(y1:y3), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 10
#> x y1 y2 y3 z max_x max_z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 3 3 0 2 4
#> 2 2 1 2 3 3 3 3 -2 -1 0
#> 3 3 6 4 2 1 3 3 5 3 1
Created on 2018-10-29 by the reprex package (v0.2.1)由reprex 包(v0.2.1) 于 2018 年 10 月 29 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.