Setting multiple values to NA with dplyr

Question

I have a data frame from a survey which has several types of missing values that varies between the columns. In some questions they used only "97", while in other questions they used "98", "99" or "99999" etc. What I want is a fast and simple way to check within each column if they contain one of the missing values types and setting all of them as NA. I found a solution on this website that works with simple columns, but there must be a more efficient way?

Here is an example of my data set containing two different missing values types (98 and 99):

  safety_ensured social_trust approval_gov empl_opp gap_rich_poor
           <dbl>        <dbl>        <dbl>    <dbl>         <dbl>
1              3           98           99       NA             2
2             99           98           99        3            98
3              2           98           99       98            98
4              3           98           99        3             3
5              3           98           99        1            98

I found here a solution using dplyr and a function, but when I do that, it turns my data frame to a list.

is_na <- function(x){
  return(as.character(x) %in% c("96", "97", "98", "99", "99999")) 
}
dataset <- dataset %>%
  lapply(is_na)

Greetings

Answer 1

We can create a vector of values, then use mutate/across (from dplyr 1.0.0), and replace the values in each of the columns ( everything() - to select all column) where it matches the 'vec' ( %in% ) to NA )

library(dplyr)
vec <- c(96:99, 99999)
dataset %>%
   mutate(across(everything(), ~ replace(., . %in% vec, NA)))

Setting multiple values to NA with dplyr

Question

1 answers

solution1
3 ACCPTED 2020-06-13 18:30:05

Setting multiple values to NA with dplyr

Question

1 answers

solution1 3 ACCPTED 2020-06-13 18:30:05

solution1
3 ACCPTED 2020-06-13 18:30:05