I would like to correct a column in a data.frame by subtracting from it another column with nearly identical name, but this other column has a suffix. I would like to use the mutate_at
function for this.
Trying to figure this out, I have struggled to access the name of a column in the function part of mutate_at , to the use it to access the other column.
I show this in a small example below, but basically I would like to access the name of the column used at the moment .
and then select from the data in the pipe a column that has the same name as .
but with a suffix (below that would be "_new"
).
Thanks for your help!
Here is an example of how I would have liked to do it - but this does not work.
library(tidyverse)
data("mtcars")
new <- mtcars/4
names(new) <-paste0(names(new),"_new")
df <- bind_cols(mtcars,new)
df %>%
mutate_at(.vars = vars(carb,disp),
.funs = list(corrected = ~ . - df %>% pull(paste0(names(.),"_new"))))
df %>% pull(paste0("carb","_new"))
Instead of using mutate_at
why not use mutate
combined with across
and cur_column
ie:
df %>%
mutate( across( c(carb,disp), ~ . - pull(df, paste0(cur_column(), "_new") ), .names = "{.col}_corrected") )
We can't access the names
of the .
inside mutate_at
because it is a vector
and doesn't have the column name info. One option is map2
library(purrr)
library(dplyr)
library(stringr)
nm1 <- c('carb', 'disp')
map_dfc(nm1, ~ df %>%
transmute(!!str_c(.x, '_corrected') :=
!! rlang::sym(.x) - !! rlang::sym(str_c(.x, "_new"))))%>%
bind_cols(df, .) %>%
head
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_new cyl_new disp_new hp_new drat_new wt_new qsec_new
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5.250 1.5 40.00 27.50 0.9750 0.65500 4.1150
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5.250 1.5 40.00 27.50 0.9750 0.71875 4.2550
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5.700 1.0 27.00 23.25 0.9625 0.58000 4.6525
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5.350 1.5 64.50 27.50 0.7700 0.80375 4.8600
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 4.675 2.0 90.00 43.75 0.7875 0.86000 4.2550
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 4.525 1.5 56.25 26.25 0.6900 0.86500 5.0550
# vs_new am_new gear_new carb_new carb_corrected disp_corrected
#1 0.00 0.25 1.00 1.00 3.00 120.00
#2 0.00 0.25 1.00 1.00 3.00 120.00
#3 0.25 0.25 1.00 0.25 0.75 81.00
#4 0.25 0.00 0.75 0.25 0.75 193.50
#5 0.00 0.00 0.75 0.50 1.50 270.00
#6 0.25 0.00 0.75 0.25 0.75 168.75
Or we can update with a for
loop
for(nm in nm1) {
df <- df %>%
mutate(!! str_c(nm, '_corrected') :=
!! rlang::sym(nm) - !! rlang::sym(str_c(nm, '_new')))
}
Or using base R
with lapply
df[paste0(nm1, "_corrected")] <- lapply(nm1, function(nm)
df[nm] - df[paste0(nm, "_new")])
As the others already pointed out, variable names cannot be accessed in mutate_at
and this holds also true for the coming mutate(across())
. I addressed this issue as feature request for dplyr
here , but apparently, this kind of data-wrangling task is too specialized for dplyr
. Below I provide my favorite work-around for this kind of data-wrangling problems, which consists of two steps:
:: rlang::sym()
to generate variable based on a character vector of variable namespurrr::reduce
.library(tidyverse)
# your toy data
df <- mtcars %>%
as_tibble %>%
mutate_all(list(new =~ ./4))
# step 1: generate helper function, in this case a simple `mutate` call
gen_corrected <- function(df, x) {
mutate(df,
"{x}_corrected" := !! rlang::sym(x) - !! rlang::sym(str_c(x, "_new"))
)
}
# step 2:
# use purrr's `reduce` on the vector of vars you want to change
# the vector of variables can be defined in a separate step
# important: you need to set `.init = .`
df %>%
purrr::reduce(c('carb', 'disp'), gen_corrected, .init = .)
#> # A tibble: 32 x 24
#> mpg cyl disp hp drat wt qsec vs am gear carb mpg_new
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 5.25
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 5.25
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 5.7
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 5.35
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 4.68
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 4.53
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 3.58
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 6.1
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 5.7
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 4.8
#> # … with 22 more rows, and 12 more variables: cyl_new <dbl>, disp_new <dbl>,
#> # hp_new <dbl>, drat_new <dbl>, wt_new <dbl>, qsec_new <dbl>, vs_new <dbl>,
#> # am_new <dbl>, gear_new <dbl>, carb_new <dbl>, carb_corrected <dbl>,
#> # disp_corrected <dbl>
Created on 2020-05-21 by the reprex package (v0.3.0)
In the github issue mention above @Romain Francois provides another work-around to this issue.
You can use Map
in base R or map2
from purrr
for this:
cols <- c('carb', 'disp')
df[paste0(cols, '_corrected')] <- Map(`-`, df[cols], df[paste0(cols, '_new')])
Using map2
library(purrr)
df[paste0(cols, '_corrected')] <- map2(df[cols], df[paste0(cols, '_new')], `-`)
Here is a base R solution. We can define a function, diff_col
, that creates the operation and then use a for loop to repeatedly apply this function to modify the data frame.
diff_col <- function(col, dat){
dat[[paste0(col, "_corrected")]] <- dat[[col]] - dat[[paste0(col, "_new")]]
return(dat)
}
for (name in c("carb", "disp")){
df <- diff_col(col = name, dat = df)
}
I came to this answer and thought to capture a condensed solution based on the tricks mentioned above.
the_clauses <- c("carb", "disp") %>%
set_names(., str_c(., "_new")) %>%
map(.f = ~quo(!!sym(.x)/4))
# Equivalent to:
# quos(carb_new = carb/4, disp_new = disp/4)
mtcars %>% mutate(!!!the_clauses)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.