简体   繁体   中英

How to identify used variables in a statistical model in R

I'm predicting data by a model generated with random forest. How can I identify the predictor variables used in the model? I cant get rid of the error: "Error in eval(predvars, data, env) : object 'ENERGY' not found".

ENERGY column is the column I'm trying to predict and not used as a variable in model generation. And this column does not exist in varImp(DATA)$importance.

predict(model_RF2, newdata = predData)

Error in eval(predvars, data, env) : object 'ENERGY' not found

varImp(DATA)$importance

As the reply of this code, names of 60 columns listed; and ENERGY is not in this vector.

Is there any other method to identify the columns used in model ?

The important part of the prediction is to verify that new data frame (predData in your case) has the same column names as the original data. So if your original data had ENERGY column, you have to be sure that your predData has it as well.

See documentation here: https://www.rdocumentation.org/packages/rpart/versions/4.1-13/topics/predict.rpart

Quote: " newdata data frame containing the values at which predictions are required. The predictors referred to in the right side of formula(object) must be present by name in newdata."

RF algorithm shows the importances that are used to predict a certain variable. However, it doesn't show the importance of the variable that is going to be predicted. Since you are predicting 'ENERGY' variable, it is normal that you don't see it's name on the list. Also;

importance(model_RF2)

and

varImpPlot(model_RF2)

can show you the columns (variables) that are used in the model. varImpPlot also pictures a nice graph.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM