I have a dataset with many variables, some of them are character variables, which I would like to convert to factors. Since there are many variables to convert, I would like to do this using the new tidy eval functionality from dplyr_0.7
. Here is a minimal example from my data:
data <- data.frame(factor1 = c("K", "V"),
factor2 = c("E", "K"),
other_var = 1:2,
stringsAsFactors = FALSE)
I have a named list containing a data.frame
for each variable which I want to convert. These data.frame
s in the list all have the same structure which can be seen in this example:
codelist_list <- list(factor1 = data.frame(Code = c("K", "V"),
Bezeichnung = c("Kauf", "Verkauf"),
stringsAsFactors = FALSE),
factor2 = data.frame(Code = c("E", "K"),
Bezeichnung = c("Eigengeschaeft", "Kundengeschaeft"),
stringsAsFactors = FALSE))
What I do not want to do is to define the factors like this for each variable:
mutate(df, factor1 = factor(factor1,
levels = codelist_list[["factor1"]][["Code"]],
labels = codelist_list[["factor1"]][["Bezeichnung"]]))
What I have tried so far is the following:
convert_factors <- function(variable, df) {
factor_variable <- enquo(variable)
df %>%
mutate(!!quo_name(factor_variable) := factor(!!quo_name(factor_variable),
levels = codelist_list[[variable]][["Code"]],
labels = codelist_list[[variable]][["Bezeichnung"]]))
}
In a first step, I want to check if my function convert_factors()
works properly by calling convert_factors("factor1", data)
which returns
factor1 factor2 other_var
1 <NA> E 1
2 <NA> K 2
The variable does not show the value labels, but is replaced by NA
instead.
The ultimate goal would be to map
over all variables which I want to convert. Here, I tried map(c("factor1", "factor2"), convert_factors, df = data)
, which returned
Error in (function (x, strict = TRUE) : the argument has already been evaluated
I tried to follow the instructions from http://dplyr.tidyverse.org/articles/programming.html , but this is all I came up with.
Does anyone know where the problem is (and hopefully explain my error to me).
I think you mixed up quosures and strings:
In you funtion, variable
is a string, not an expression. So you should convert it to quo with rlang::sym
, instead of enquo
.
quo_name
is used to convert an expression to string. As variable
is already a string, you can directly do !!variable
on rhs (right hand side) in mutate
.
at lhs in mutate
you need to unquo factor_variable
with !!
instead of trying to convert it to a string with quo_name
.
After correcting for the above errors, you function will work:
convert_factors <- function(variable, df) {
factor_variable <- rlang::sym(variable)
df %>%
mutate(!!variable := factor(
!!factor_variable,
levels = codelist_list[[variable]][["Code"]],
labels = codelist_list[[variable]][["Bezeichnung"]]))
}
# > convert_factors('factor1', data)
# factor1 factor2 other_var
# 1 Kauf E 1
# 2 Verkauf K 2
Here is what I try:
params <- lapply(codelist_list, setNames, nm = c('levels', 'labels'))
convert_factors <- function(variable, df) {
factor_variable <- rlang::sym(variable)
factor_param <- c(list(factor_variable), params[[variable]])
df %>% mutate(!!variable := do.call(factor, factor_param))
}
convert_factors('factor1', data)
# factor1 factor2 other_var
# 1 Kauf E 1
# 2 Verkauf K 2
Nice solution of mt1022 using tidy eval and dplyr. However, this task could be accomplished unsing only base-R:
data[,names(codelist_list)] <- lapply(names(codelist_list), function(x)
data[,x] <- factor(data[,x],
levels = codelist_list[[x]][["Code"]],
labels = codelist_list[[x]][["Bezeichnung"]]))
You could approach this with mutate_at
, using the .
coding within funs
to apply a function to multiple columns at once.
This approach still involves using tidyeval
to pull the correct list from codelist_list
while referring to the variable via .
.
mutate_at(data, c("factor1", "factor2"),
funs( factor(., levels = codelist_list[[quo_name(quo(.))]][["Code"]],
labels = codelist_list[[quo_name(quo(.))]][["Bezeichnung"]]) ) )
factor1 factor2 other_var
1 Kauf Eigengeschaeft 1
2 Verkauf Kundengeschaeft 2
If you wanted to make a function to pass to mutate_at
, you can do so, with a few slight changes.
convert_factors = function(variable) {
var2 = enquo(variable)
factor(variable, levels = codelist_list[[quo_name(var2)]][["Code"]],
labels = codelist_list[[quo_name(var2)]][["Bezeichnung"]])
}
mutate_at(data, c("factor1", "factor2"), convert_factors)
factor1 factor2 other_var
1 Kauf Eigengeschaeft 1
2 Verkauf Kundengeschaeft 2
Since you're just using strings and SE functions (the factor constructor), you don't need expressions or quosures. Just use name-unquoting with :=
convert_factors <- function(variable, df) {
factor <- factor(variable,
levels = codelist_list[[variable]][["Code"]],
labels = codelist_list[[variable]][["Bezeichnung"]]
)
mutate(df, !! variable := factor)
}
map(c("factor1", "factor2"), convert_factors, df = data)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.