I have a data set where missing values have been coded with a dot. I would like to have missing values blank (NA).
Here is the data frame:
df <- data.frame(ITEM1 = c(6, 8, '.'),
ITEM2 = c(1, 6, 9),
ITEM3 = c(4, 2, 5),
ITEM4 = c('.', 3, 2),
ITEM5 = c(1, 6, 9)
)
df
ITEM1 ITEM2 ITEM3 ITEM4 ITEM5
1 6 1 4 . 1
2 8 6 2 3 6
3 . 9 5 2 9
>
The columns will be character
class because of the presence of .
. Create a logical matrix
with ==
and assign those elements to NA
, then convert the data.frame columns to its appropriate type with type.convert
df[df == "." & !is.na(df)] <- NA
df <- type.convert(df, as.is = TRUE)
Or in a single step with replace
(which internally does the assignment)
df <- type.convert(replace(df, df == "." & !is.na(df), NA), as.is = TRUE)
Or another approach is
df[] <- lapply(df, function(x) replace(x x %in% '.', NA))
df <- type.convert(df, as.is = TRUE)
Generally, this can be avoided all together, while reading the data itself ie specify na.strings = "."
in read.csv/read.table
etc.
You could use the na_if
function from dplyr
. Note that the dot changes the type of your columns to be char
which might not be what you want afterwards! The following code finds all char
columns, replaces .
with NA
and converts the column to be numeric:
df <- df %>%
mutate(across(where(is.character), ~as.numeric(na_if(., "."))))
Here is an alternativ with set_na
from sjlabelled
package. Note the columns will remain as character type.
library(sjlabelled)
set_na(df, na = ".", as.tag = FALSE)
Output:
ITEM1 ITEM2 ITEM3 ITEM4 ITEM5
1 6 1 4 <NA> 1
2 8 6 2 3 6
3 <NA> 9 5 2 9
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.