I wanted to use the Random Forest to find the most important features for a classification problem (I have two classes: 0 and 1).
I created the model:
rf = randomForest(y ~ ., data = df, sampsize=100000,ntree=100, importance = TRUE, keep.forest = FALSE)
And then I used the following to check the importance:
importance(rf, type = 1, class = 1)
I read that the class parameter can be used for a classification problem. My question is if I have to sort the results by their absolute value in Mean Decrease accuracy. When I use the VarImpPlot
seems that I should consider also the negative values. And what exactly does the parameter class = 1
?
We can use the iris dataset, it has 3 species in it:
data(iris) table(iris$Species)
setosa versicolor virginica
50 50 50
We fit a random forest:
library(randomForest)
mdl = randomForest(Species~.,data=iris,importance=TRUE)
# let's do it without options
importance(mdl)
setosa versicolor virginica MeanDecreaseAccuracy
Sepal.Length 6.364533 6.2112640 7.632076 10.365371
Sepal.Width 4.790211 0.4339124 5.500338 5.153676
Petal.Length 22.027701 34.5777755 29.080648 35.215194
Petal.Width 22.500729 31.1403378 30.714576 33.335003
MeanDecreaseGini
Sepal.Length 9.223319
Sepal.Width 2.189763
Petal.Length 44.703684
Petal.Width 43.163546
The above table is all your results, if you do importance(mdl,type=1) you get decrease in mean accuracy across all classes for this variable. You see three separate columns for each class you can predict (setosa, versicolor,virginica), so if you do:
importance(mdl,type=1,class="setosa")
setosa
Sepal.Length 6.364533
Sepal.Width 4.790211
Petal.Length 22.027701
Petal.Width 22.500729
You can the change in accuracy associated with this class.
So in your code, when you do importance(rf, type = 1, class = 1)
, and your model is randomForest(y ~ ., data = df... )
, you are trying to find the importance of the variable, associated with predicted which has the label 1 in y.
Lastly, you can sort them like:
res = importance(mdl,type=1,class="setosa")
res = res[order(res[,1],decreasing=TRUE),drop=FALSE,]
res
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.