简体   繁体   中英

Copying factors from previous data frames in R

I'd like to copy the factor levels from a pre-existing data frame into a newly created data frame, rather than assigning the levels by hand.

In order to use the 'predict' function, R requires that the new data be in a data frame where the factors are the same as the that of the model-training data. I'd like to believe that the factors could be copied from the training data to the new data frame. I have gotten this to work, as shown in the code below, albeit clumsily.

# Build the model
naive_model <- NaiveBayes(outcome ~ purpose_ + home_ + emp_len_, data = loan_data, na.action = na.omit)

# Create new data point to be tested
new_loan_frame <- data.frame(purpose_ = "small_business", home_ = "MORTGAGE", emp_len_ = "> 1 Year")

# Add the necessary factors to match the training data
new_loan_frame$purpose_ <- factor(new_loan_frame$purpose_, levels = c("credit_card","debt_consolidation", "home_improvement", "major_purchase", "medical","other","small_business"))
new_loan_frame$home_ <- factor(new_loan_frame$home_, levels = c("MORTGAGE", "OWN", "RENT"))
new_loan_frame$emp_len_ <- factor(new_loan_frame$emp_len_, levels = c("< 1 Year", "> 1 Year"))

# Run the prediction using the model and the new data
predict(naive_model, new_loan_frame)

Writing out the factors for each input type seems more onerous than I'd expect should be necessary. What would be the best way to clean this up?

You can automate all of it.

for(cn in colnames(loan_data)) {
  new_loan_frame[,cn] <- factor(new_loan_frame[,cn], levels=levels(loan_data[,cn]))
}

Hi and welcome to Stackoverflow, It is correct that in order to predict, you have to have your data very well organized in a dataframe. please, try this:

new_loan_frame <- data.frame(purpose= rep(levels(loan_data$purpose),3), home = rep(levels(loan_data$home),each=7), emp_len=rep(levels(loan_data$emp_len)))

Preds1<-predict(naive_model , newdata=new_load_frame, level=0)

Additionally, try aways to not use "_" in the level names. instead, you can simply use: , sep="_")

Good luck

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM