[英]Replacing NA with a value - R
Is there a way that I can replace missing values in a vector by randomly sampling from the rest of the data? 有没有一种方法可以通过从其余数据中随机采样来替换向量中的缺失值?
eg 例如
age<-c(4.2,5.6,NA,8.4,9.8,NA,10.4,15.3)
age[is.na(age)]<-sample(age,length(age[is.na(age)]),replace=TRUE) ## trying to replace NA values with a random value from age.
I don't understand why this does not work? 我不明白为什么这行不通? Ideally I would like each NA value to be replaced by a different value. 理想情况下,我希望将每个NA值替换为一个不同的值。
age[is.na(age)] <- sample(age[!is.na(age)], sum(is.na(age)), replace=F)
@Ananda Mahto建议的sum(is.na(age))
In cases where random sampling is undesirable you might replace them with means, medians, or even a model fit: 如果不希望使用随机抽样,则可以用均值,中位数甚至模型拟合代替它们:
library(e1071)
?impute
impute(as.matrix(age),what="mean") # replaces with mean 8.95
or 要么
library(randomForest)
?na.roughfix
na.roughfix(age) # replaces with median 9.1
If age
is a predictor and you have responses you could use random forest to impute 如果age
是预测因素,并且您有回应,则可以使用随机森林进行估算
library(randomForest)
?rfImpute
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.