[英]Using functions of multiple columns in a dplyr mutate_at call
I'd like to use dplyr's mutate_at
function to apply a function to several columns in a dataframe, where the function inputs the column to which it is directly applied as well as another column in the dataframe.我想使用 dplyr 的
mutate_at
函数将函数应用于数据帧中的几列,其中该函数输入直接应用它的列以及数据帧中的另一列。
As a concrete example, I'd look to mutate the following dataframe作为一个具体的例子,我希望改变以下数据帧
# Example input dataframe
df <- data.frame(
x = c(TRUE, TRUE, FALSE),
y = c("Hello", "Hola", "Ciao"),
z = c("World", "ao", "HaOlam")
)
with a mutate_at
call that looks similar to this具有与
mutate_at
调用
df %>%
mutate_at(.vars = vars(y, z),
.funs = ifelse(x, ., NA))
to return a dataframe that looks something like this返回一个看起来像这样的数据框
# Desired output dataframe
df2 <- data.frame(x = c(TRUE, TRUE, FALSE),
y_1 = c("Hello", "Hola", NA),
z_1 = c("World", "ao", NA))
The desired mutate_at
call would be similar to the following call to mutate
:所需的
mutate_at
调用类似于以下对mutate
调用:
df %>%
mutate(y_1 = ifelse(x, y, NA),
z_1 = ifelse(x, z, NA))
I know that this can be done in base R in several ways, but I would specifically like to accomplish this goal using dplyr's mutate_at
function for the sake of readability, interfacing with databases, etc.我知道这可以通过多种方式在基础 R 中完成,但为了可读性、与数据库的接口等,我特别想使用 dplyr 的
mutate_at
函数来实现这个目标。
Below are some similar questions asked on stackoverflow which do not address the question I posed here:以下是在 stackoverflow 上提出的一些类似问题,但并未解决我在此处提出的问题:
adding multiple columns in a dplyr mutate call 在 dplyr mutate 调用中添加多列
dplyr::mutate to add multiple values dplyr::mutate 添加多个值
Use of column inside sum() function using dplyr's mutate() function 使用 dplyr 的 mutate() 函数在 sum() 函数中使用列
This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity. @eipi10 在@eipi10 对这个问题的评论中回答了这个问题,但我写在这里是为了后代。
The solution here is to use:这里的解决方案是使用:
df %>%
mutate_at(.vars = vars(y, z),
.funs = list(~ ifelse(x, ., NA)))
You can also use the new across()
function with mutate()
, like so:您还可以将新的
across()
函数与mutate()
,如下所示:
df %>%
mutate(across(c(y, z), ~ ifelse(x, ., NA)))
The use of the formula operator (as in ~ ifelse(...)
) here indicates that ifelse(x, ., NA)
is an anonymous function that is being defined within the call to mutate_at()
.此处使用公式运算符(如
~ ifelse(...)
)表明ifelse(x, ., NA)
是在对mutate_at()
的调用中定义的匿名函数。
This works similarly to defining the function outside of the call to mutate_at()
, like so:这类似于在调用
mutate_at()
之外定义函数,如下所示:
temp_fn <- function(input) ifelse(test = df[["x"]],
yes = input,
no = NA)
df %>%
mutate_at(.vars = vars(y, z),
.funs = temp_fn)
Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA))
, but the funs()
function is being deprecated and will soon be removed from dplyr.注意 dplyr 中的语法更改:在 dplyr 版本 0.8.0 之前,您只需编写
.funs = funs(ifelse(x, . , NA))
,但funs()
函数已被弃用,并将很快从 dplyr 中删除.
To supplement the previous response, if you wanted mutate_at()
to add new variables (instead of replacing), with names such as z_1
and y_1
as in the original question, you just need to:为了补充之前的回答,如果您希望
mutate_at()
添加新变量(而不是替换),名称如原始问题中的z_1
和y_1
,您只需要:
across()
: add .names="{.col}_1"
, or alternatively use list('1'=~ifelse(x, ., NA)
(back ticks!) across()
:添加.names="{.col}_1"
,或者使用list('1'=~ifelse(x, ., NA)
(反list('1'=~ifelse(x, ., NA)
!)list('1'=~ifelse(x, ., NA)
list('1'=~ifelse(x, ., NA)
funs('1'=ifelse(x, ., NA)
funs('1'=ifelse(x, ., NA)
library(tidyverse)
df <- data.frame(
x = c(TRUE, TRUE, FALSE),
y = c("Hello", "Hola", "Ciao"),
z = c("World", "ao", "HaOlam")
)
## Version >=1
df %>%
mutate(across(c(y, z),
list(~ifelse(x, ., NA)),
.names="{.col}_1"))
#> x y z y_1 z_1
#> 1 TRUE Hello World Hello World
#> 2 TRUE Hola ao Hola ao
#> 3 FALSE Ciao HaOlam <NA> <NA>
## 0.8 - <1
df %>%
mutate_at(.vars = vars(y, z),
.funs = list(`1`=~ifelse(x, ., NA)))
#> x y z y_1 z_1
#> 1 TRUE Hello World Hello World
#> 2 TRUE Hola ao Hola ao
#> 3 FALSE Ciao HaOlam <NA> <NA>
## Before 0.8
df %>%
mutate_at(.vars = vars(y, z),
.funs = funs(`1`=ifelse(x, ., NA)))
#> Warning: `funs()` is deprecated as of dplyr 0.8.0.
#> Please use a list of either functions or lambdas:
#>
#> # Simple named list:
#> list(mean = mean, median = median)
#>
#> # Auto named with `tibble::lst()`:
#> tibble::lst(mean, median)
#>
#> # Using lambdas
#> list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#> x y z y_1 z_1
#> 1 TRUE Hello World Hello World
#> 2 TRUE Hola ao Hola ao
#> 3 FALSE Ciao HaOlam <NA> <NA>
Created on 2020-10-03 by the reprex package (v0.3.0)由reprex 包(v0.3.0) 于 2020 年 10 月 3 日创建
For more details and tricks, see: Create new variables with mutate_at while keeping the original ones有关更多详细信息和技巧,请参阅: 在保留原始变量的同时使用 mutate_at 创建新变量
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.