I have a (70 rows x 4 columns) dataframe (data) that has 10% of NAs. My dataframe has no more than one NA per row. From this dataset, I would like to produce 10 dataframes with 60% of NAs. But I do not want to have entirely empty (=all-NA) rows. So I made a while loop nested into a for loop. The code is working but it is taking a very long time to run. As I need to run this loop for many datasets I would like to know if there is an easy way to improve it.
My dataframe looks like that:
library(missForest)
data<-iris[1:70,1:4]
for(i in 1:28){
data[i,]<-prodNA(data[i,],noNA =0.25)
}
And here is my loop:
missing.data<-list()
for(j in 1:10){
missing.data[[j]]<-prodNA(data, noNA = 0.6)
while(sum(rowSums(is.na(missing.data[[j]]))==4)!=0) {
missing.data[[j]]<-prodNA(data, noNA = 0.6)
}
}
EDIT: The loop becomes very slow for noNA > 0.55 but unfortunately I need to introduce 60% of NA's.. Also, the NA's introduced in the loop are introduced completely at random, so they can "replace" the NA's that are in the original dataframe (data).
I am not sure if this is what you are looking for:
library(missForest)
data1<-iris[1:70,1:4]
for(i in 1:28){
data1[i,]<-prodNA(mydata[i,],noNA =0.10)
}
table(is.na(data1))
n<-10
data2<-do.call("rbind", replicate(n, data1, simplify=FALSE))
table(is.na(data2))
data3<-prodNA(data2,noNA=0.55)
> table(is.na(data3))
FALSE TRUE
1133 1667
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.