简体   繁体   中英

understanding per class variable importance in 'randomForest' R package

I'm having trouble understanding the by class columns in the importance function inside of randomForest.

My data set has two classes, "Current" and "Departed". To predict those classes,

I first create a random forest model:

fit <- randomForest(IsDeparted ~ ..., df_train),

Then I run the importance function:

importance(fit)   

Now I get a snippet of results like this, importance measure in four columns: "Current" "Departed" "MDA" "GiniDecrease"

Could someone explain how to interpret the first two class columns? Is it the mean decrease in accuracy of predicting one particular class after permuting values of that particular variable? And if so, does that mean I should focus on those columns rather than the MDA column when doing feature selection if I am more interested in the model's performance for one particular class?

Yes, the first two columns are for the specific classes. It is the mean decrease in accuracy scaled by their own standard errors. If you are interested in the accuracy of one class, you can look at that.

Let's use an example, where the default importance() function returns a scaled importance:

library(randomForest)
set.seed(111)
fit = randomForest(Species ~ .,data=iris,importance=TRUE)
importance(fit)

                setosa versicolor virginica MeanDecreaseAccuracy
Sepal.Length  6.716993  7.4654657  7.697842            10.869088
Sepal.Width   4.581990 -0.5208697  4.224459             3.772957
Petal.Length 22.155981 33.0549839 27.892363            33.272150
Petal.Width  22.497643 31.4966353 31.589361            33.123064
             MeanDecreaseGini
Sepal.Length         9.333510
Sepal.Width          2.425592
Petal.Length        43.324744
Petal.Width         44.146107

If you look at the unscaled, you can see the MDA column is roughly the average of the 3 classes, in this case because the 3 classes are balanced. If you have imbalanced class it will be different:

                  setosa   versicolor   virginica MeanDecreaseAccuracy
Sepal.Length 0.034156211  0.021093423 0.036147901          0.030810465
Sepal.Width  0.006522917 -0.001117593 0.006937731          0.004273138
Petal.Length 0.329299111  0.301621639 0.296869242          0.305569113
Petal.Width  0.335363736  0.298729184 0.279526019          0.302855284
             MeanDecreaseGini
Sepal.Length         9.333510
Sepal.Width          2.425592
Petal.Length        43.324744
Petal.Width         44.146107

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM