I am trying to use the missForest package to impute missing data into a fairly large dataset.
missForest takes data in the form of aa data matrix with missing values. The columns correspond to the variables and the rows to the observations. Therefore, I converted my dataframe to a matrix, which inadvertently turned all of my categorical variables to numeric type.
Does anyone know how to assign a column of a matrix as a factor??
Thank you so much!!!
Ok let me add more details.
I have a data frame that has the following columns.
homt_sub<-homt[c("CASEID","REASON","PSYPROB","SUB2.2","FREQ1","FRSTUSE1","FREQ2","AGEcont","GENDER", "RACE2", "ARRESTS")]
The only continuous variable is AGEcont. The rest are factors. I had to make a matrix to use the missForest function.
homt_matrix<-data.matrix(homt_sub, rownames.force = NA)
homt_sub.imp <- missForest(homt_matrix, verbose= TRUE, maxiter = 3, ntree = 20)
I can extract the imputations from here but I get decimal values because they were treated as continuous variables.
model.matrix could be a solution but it seems a bit burdensome to create so many extra variables, get the imputed data and then collapse it back to one column for each variable later? I know there is a way to run randomForest with factor variables, but it's very unclear how to do it.
Thank you so much
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.