I'm a beginner R user, and I'm trying to clean up data from Excel spreadsheets. I've read about dplyr::across()
and so I'm trying to use it in mutate
pipelines.
I need to convert some columns which are being incorrectly imported as character
--these are meant to be integers, but I reckon there may be the occasionally typo such as an extra space which is confusing readxl::readxlsx()
.
I'm trying to run the following code, which does work but generates a warning from dplyr
:
library(dplyr, warn.conflicts = FALSE)
# Copy built-in DF
my_iris <- iris
# Make random character vectors
rand_string1 <- sample(LETTERS, size = nrow(iris), replace = TRUE)
rand_string2 <- as.character(
sample(100, size = nrow(iris), replace = TRUE)
)
# Fill new character columns in the DF. The second one is supposed to be casted
# to int
my_iris$A_rand_char <- rand_string1
my_iris$B_rand_char <- rand_string2
# Mutate: select all char columns **except** the ones whose name matches the
# regex, and make them numeric. In the example, only new_iris$B_rand_char should
# be affected
mutated_iris <- my_iris %>%
mutate(
# Get all char variables except 'A_rand_char' (see below) and ID code
across(
is.character & !matches('A_rand'),
as.numeric
),
)
# Old data
class(my_iris$A_rand_char)
#> [1] "character"
class(my_iris$B_rand_char)
#> [1] "character"
# New data
# Old character column(s) still character:
class(mutated_iris$A_rand_char)
#> [1] "character"
# Column(s) converted to numeric:
class(mutated_iris$B_rand_char)
#> [1] "numeric"
This does the job of converting character columns except the ones I explicitly exclude via !matches(reg_exp_string)
, but dplyr
warns that I should wrap my selection "predicate functions" in where()
.
My problem is, when I do that I get an error. For brevity, if I just wrap the is.character &…
line above in a where()
call, I get:
Error: Problem with `mutate()` input `..1`.
x operations are possible only for numeric, logical or complex types
ℹ Input `..1` is `across(where(is.character & !matches("A_rand")), as.numeric)`.
I guess this is wrong because I'm passing a function intersected with the return value of !matches("A_rand")
. But again, when I use a purrr
-style syntax, as per the last example in where()
's documentation:
where(~ is.character(.x) && !matches('A_rand'))
I get:
Error: Problem with `mutate()` input `..1`.
x `where()` must be used with functions that return `TRUE` or `FALSE`.
ℹ Input `..1` is `across(where(~is.character(.x) && !matches(.x, "A_rand")), as.numeric)`.
So now the problem seems to be that these 2 functions return something different from a boolean vector, and I'm stuck because I really thought that was what they were supposed to do--especially matches()
, which is classified as selection helper in the documentation.
Again, the first version of the code does work, but generates sort-of-deprecation warnings.
What is a more tidyverse
-correct way to select all character columns except those whose name matches a regexp?
Thanks to anyone who can contribute…
You were very close: Here is the correct syntax:
mutated_iris <- my_iris %>%
mutate(
# Get all char variables except 'A_rand_char' (see below) and ID code
across(
where(is.character) & !matches('A_rand'),
as.numeric
)
)
is.character
in where
This is because is.character
is a predicate function, whereas where()
is a selection helper . You need to wrap is.character
in where
because it is not a selection helper.
This is the code you need:
mutated_iris <- my_iris %>%
mutate(
# Get all char variables except 'A_rand_char' (see below) and ID code
across(
where(is.character) & !matches('A_rand'),
as.numeric
),
)
Selection helpers are strictly for use within dplyr
verbs, as the errors below demonstrate.
require(dplyr)
base::is.character("hi")
#> [1] TRUE
try(tidyr::matches("hi"))
#> Error : `matches()` must be used within a *selecting* function.
#> i See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
try(where(is.character("hi")))
#> Error in where(is.character("hi")) : could not find function "where"
tibble(a = character()) %>%
mutate(across(where(is.character), rev))
#> # A tibble: 0 x 1
#> # ... with 1 variable: a <chr>
Created on 2021-01-24 by the reprex package (v0.3.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.