简体   繁体   English

如何在 R 中将点重新编码为 NA?

[英]How to recode dot to NA in R?

I have a data set where missing values have been coded with a dot.我有一个数据集,其中缺失值已用点编码。 I would like to have missing values blank (NA).我想将缺失值留空(NA)。

Here is the data frame:这是数据框:

df <- data.frame(ITEM1 = c(6, 8, '.'),
                   ITEM2 = c(1, 6, 9),
                   ITEM3 = c(4, 2, 5),
                   ITEM4 = c('.', 3, 2),
                   ITEM5 = c(1, 6, 9)
)

df

ITEM1 ITEM2 ITEM3 ITEM4 ITEM5
1     6     1     4     .     1
2     8     6     2     3     6
3     .     9     5     2     9
> 

The columns will be character class because of the presence of . character . . . Create a logical matrix with == and assign those elements to NA , then convert the data.frame columns to its appropriate type with type.convert使用==创建一个逻辑matrix并将这些元素分配给NA ,然后使用type.convert将 data.frame 列转换为其适当的类型

df[df == "." & !is.na(df)] <- NA
df <- type.convert(df, as.is = TRUE)

Or in a single step with replace (which internally does the assignment)或者在一个步骤中使用replace (在内部进行分配)

df <- type.convert(replace(df, df == "." & !is.na(df), NA), as.is = TRUE)

Or another approach is或者另一种方法是

df[] <- lapply(df, function(x) replace(x x %in% '.', NA))
df <- type.convert(df, as.is = TRUE)

Generally, this can be avoided all together, while reading the data itself ie specify na.strings = "."通常,这可以一起避免,同时读取数据本身,即指定na.strings = "." in read.csv/read.table etc.read.csv/read.table

You could use the na_if function from dplyr .您可以使用na_if中的 na_if dplyr Note that the dot changes the type of your columns to be char which might not be what you want afterwards!请注意,点会将列的类型更改为char ,这可能不是您以后想要的! The following code finds all char columns, replaces .以下代码查找所有char列,替换. with NA and converts the column to be numeric:使用NA并将列转换为数字:

df <- df %>%
    mutate(across(where(is.character), ~as.numeric(na_if(., "."))))

Here is an alternativ with set_na from sjlabelled package.这是来自sjlabelled package 的set_na的替代方案。 Note the columns will remain as character type.请注意,这些列将保留为字符类型。

library(sjlabelled)
set_na(df, na = ".", as.tag = FALSE)

Output: Output:

ITEM1 ITEM2 ITEM3 ITEM4 ITEM5
1     6     1     4  <NA>     1
2     8     6     2     3     6
3  <NA>     9     5     2     9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM