简体   繁体   中英

How to replace empty strings in a dataframe with NA (missing value) not NA string

I have a titanic xlsx file, which has a lot of blank or empty cells and I saved the file as csv and all the blanks were saved as it is.

When I import the csv file I see a lot of empty strings/blanks in the dataset, one such column is boat

I could just go and use readxl package's functions such as read_xls or read_xlsx,which would replace the empty strings with NA

But I would like to know if there is a way if I can replace the empty strings after loading into R in the dataframe.

I tried this way but it throws up error, which I don't exactly understand. I can specify NA in 'NA' in the below code then it would replace with NA but that would be string (NA) not missing value NA, both would be different.

titanic %>% mutate(boat = if_else(boat=="", NA ,boat))

Error in mutate_impl(.data, dots) : 
Evaluation error: `false` must be type logical, not character.

By specifying just NA , according to ?NA -"NA is a logical constant of length 1 which contains a missing value."

The class can be checked

class(NA)
#[1] "logical"
class(NA_character_) 
#[1] "character"

and both of them is identified by standard functions such as is.na

is.na(NA)
#[1] TRUE
is.na(NA_character_)
#[1] TRUE

The if_else is type sensitive, so instead of specifying as NA which returns a logical output, it can specified as either NA_real_ , NA_integer_ , NA_character_ depending on the type of the 'boat' column. Assuming that the 'boat' is character class, we may need NA_character_

titanic %>% 
       mutate(boat = if_else(boat=="", NA_character_ ,boat))

You can replace specified values with an NA using the naniar package - http://naniar.njtierney.com/


df <- data.frame(boat = c(1, 2, "", 3), category = c("a", "b", "c", "d"))


df
#>   boat category
#> 1    1        a
#> 2    2        b
#> 3             c
#> 4    3        d
library(naniar)

df %>% replace_with_na(replace = list(boat = ""))
#>   boat category
#> 1    1        a
#> 2    2        b
#> 3 <NA>        c
#> 4    3        d

# You can also specify how to do this for a specific, using the development
# version - devtools::install_github('njtierney/naniar')
df %>% replace_with_na_at(.vars = "boat", ~.x == "")
#>   boat category
#> 1    2        a
#> 2    3        b
#> 3   NA        c
#> 4    4        d

Let me know if you need any clarification!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM