简体   繁体   中英

Warning about using `where()` in `dplyr::across()` and errors when using it

I'm a beginner R user, and I'm trying to clean up data from Excel spreadsheets. I've read about dplyr::across() and so I'm trying to use it in mutate pipelines.

I need to convert some columns which are being incorrectly imported as character --these are meant to be integers, but I reckon there may be the occasionally typo such as an extra space which is confusing readxl::readxlsx() .

I'm trying to run the following code, which does work but generates a warning from dplyr :

library(dplyr, warn.conflicts = FALSE)


# Copy built-in DF
my_iris <- iris

# Make random character vectors
rand_string1 <- sample(LETTERS, size = nrow(iris), replace = TRUE)
rand_string2 <- as.character(
    sample(100, size = nrow(iris), replace = TRUE)
)

# Fill new character columns in the DF. The second one is supposed to be casted
# to int
my_iris$A_rand_char <- rand_string1
my_iris$B_rand_char <- rand_string2

# Mutate: select all char columns **except** the ones whose name matches the
# regex, and make them numeric. In the example, only new_iris$B_rand_char should
# be affected
mutated_iris <- my_iris %>%
    mutate(
        # Get all char variables except 'A_rand_char' (see below) and ID code
        across(
            is.character & !matches('A_rand'),
            as.numeric
            ),
    )

# Old data
class(my_iris$A_rand_char)
#> [1] "character"

class(my_iris$B_rand_char)
#> [1] "character"


# New data
# Old character column(s) still character:
class(mutated_iris$A_rand_char)
#> [1] "character"

# Column(s) converted to numeric:
class(mutated_iris$B_rand_char)
#> [1] "numeric"

This does the job of converting character columns except the ones I explicitly exclude via !matches(reg_exp_string) , but dplyr warns that I should wrap my selection "predicate functions" in where() .

My problem is, when I do that I get an error. For brevity, if I just wrap the is.character &… line above in a where() call, I get:

Error: Problem with `mutate()` input `..1`.
x operations are possible only for numeric, logical or complex types
ℹ Input `..1` is `across(where(is.character & !matches("A_rand")), as.numeric)`.

I guess this is wrong because I'm passing a function intersected with the return value of !matches("A_rand") . But again, when I use a purrr -style syntax, as per the last example in where() 's documentation:

     where(~ is.character(.x) && !matches('A_rand'))

I get:

Error: Problem with `mutate()` input `..1`.
x `where()` must be used with functions that return `TRUE` or `FALSE`.
ℹ Input `..1` is `across(where(~is.character(.x) && !matches(.x, "A_rand")), as.numeric)`.

So now the problem seems to be that these 2 functions return something different from a boolean vector, and I'm stuck because I really thought that was what they were supposed to do--especially matches() , which is classified as selection helper in the documentation.

Again, the first version of the code does work, but generates sort-of-deprecation warnings.

What is a more tidyverse -correct way to select all character columns except those whose name matches a regexp?

Thanks to anyone who can contribute…

You were very close: Here is the correct syntax:

mutated_iris <- my_iris %>%
  mutate(
    # Get all char variables except 'A_rand_char' (see below) and ID code
    across(
      where(is.character) & !matches('A_rand'),
      as.numeric
    )
  )

You only need to wrap is.character in where

This is because is.character is a predicate function, whereas where() is a selection helper . You need to wrap is.character in where because it is not a selection helper.

This is the code you need:

mutated_iris <- my_iris %>%
  mutate(
    # Get all char variables except 'A_rand_char' (see below) and ID code
    across(
      where(is.character) & !matches('A_rand'),
      as.numeric
    ),
  )

Selection helpers are strictly for use within dplyr verbs, as the errors below demonstrate.

require(dplyr)
base::is.character("hi")
#> [1] TRUE

try(tidyr::matches("hi"))
#> Error : `matches()` must be used within a *selecting* function.
#> i See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.

try(where(is.character("hi")))
#> Error in where(is.character("hi")) : could not find function "where"

tibble(a = character()) %>%
  mutate(across(where(is.character), rev))
#> # A tibble: 0 x 1
#> # ... with 1 variable: a <chr>

Created on 2021-01-24 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM