简体   繁体   中英

R Optimise a while loop nested in a for loop to introduce missing values in a dataframe

I have a (70 rows x 4 columns) dataframe (data) that has 10% of NAs. My dataframe has no more than one NA per row. From this dataset, I would like to produce 10 dataframes with 60% of NAs. But I do not want to have entirely empty (=all-NA) rows. So I made a while loop nested into a for loop. The code is working but it is taking a very long time to run. As I need to run this loop for many datasets I would like to know if there is an easy way to improve it.

My dataframe looks like that:

library(missForest)
data<-iris[1:70,1:4]
for(i in 1:28){
  data[i,]<-prodNA(data[i,],noNA =0.25)
}

And here is my loop:

    missing.data<-list()

  for(j in 1:10){
    missing.data[[j]]<-prodNA(data, noNA = 0.6)
      while(sum(rowSums(is.na(missing.data[[j]]))==4)!=0) {
        missing.data[[j]]<-prodNA(data, noNA = 0.6)
    }
}

EDIT: The loop becomes very slow for noNA > 0.55 but unfortunately I need to introduce 60% of NA's.. Also, the NA's introduced in the loop are introduced completely at random, so they can "replace" the NA's that are in the original dataframe (data).

I am not sure if this is what you are looking for:

library(missForest)
data1<-iris[1:70,1:4]
for(i in 1:28){
     data1[i,]<-prodNA(mydata[i,],noNA =0.10)
 }
table(is.na(data1))
n<-10
data2<-do.call("rbind", replicate(n, data1, simplify=FALSE))
table(is.na(data2))

data3<-prodNA(data2,noNA=0.55)
> table(is.na(data3))

FALSE  TRUE 
 1133  1667 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM