用值替换NA-R

Question

Is there a way that I can replace missing values in a vector by randomly sampling from the rest of the data? 有没有一种方法可以通过从其余数据中随机采样来替换向量中的缺失值？

eg 例如

age<-c(4.2,5.6,NA,8.4,9.8,NA,10.4,15.3)

age[is.na(age)]<-sample(age,length(age[is.na(age)]),replace=TRUE)  ## trying to replace NA values with a random value from age.

I don't understand why this does not work? 我不明白为什么这行不通？ Ideally I would like each NA value to be replaced by a different value. 理想情况下，我希望将每个NA值替换为一个不同的值。

Answer 1

age[is.na(age)] <- sample(age[!is.na(age)], sum(is.na(age)), replace=F)

@Ananda Mahto建议的sum(is.na(age))

Answer 2

In cases where random sampling is undesirable you might replace them with means, medians, or even a model fit: 如果不希望使用随机抽样，则可以用均值，中位数甚至模型拟合代替它们：

library(e1071)
?impute
impute(as.matrix(age),what="mean") # replaces with mean 8.95

or 要么

library(randomForest)
?na.roughfix
na.roughfix(age) # replaces with median 9.1

If age is a predictor and you have responses you could use random forest to impute 如果age是预测因素，并且您有回应，则可以使用随机森林进行估算

library(randomForest)
?rfImpute

用值替换NA-R

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-08-10 15:24:51

解决方案2
0 2014-08-10 16:12:08

用值替换NA-R

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-08-10 15:24:51

解决方案2 0 2014-08-10 16:12:08

解决方案1
3 已采纳 2014-08-10 15:24:51

解决方案2
0 2014-08-10 16:12:08