简体   繁体   中英

Setting multiple values to NA with dplyr

I have a data frame from a survey which has several types of missing values that varies between the columns. In some questions they used only "97", while in other questions they used "98", "99" or "99999" etc. What I want is a fast and simple way to check within each column if they contain one of the missing values types and setting all of them as NA. I found a solution on this website that works with simple columns, but there must be a more efficient way?

Here is an example of my data set containing two different missing values types (98 and 99):

  safety_ensured social_trust approval_gov empl_opp gap_rich_poor
           <dbl>        <dbl>        <dbl>    <dbl>         <dbl>
1              3           98           99       NA             2
2             99           98           99        3            98
3              2           98           99       98            98
4              3           98           99        3             3
5              3           98           99        1            98

I found here a solution using dplyr and a function, but when I do that, it turns my data frame to a list.

is_na <- function(x){
  return(as.character(x) %in% c("96", "97", "98", "99", "99999")) 
}
dataset <- dataset %>%
  lapply(is_na)

Greetings

We can create a vector of values, then use mutate/across (from dplyr 1.0.0), and replace the values in each of the columns ( everything() - to select all column) where it matches the 'vec' ( %in% ) to NA )

library(dplyr)
vec <- c(96:99, 99999)
dataset %>%
   mutate(across(everything(), ~ replace(., . %in% vec, NA)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM