I do not understand what is the difference between importance function (randomForest package) and importance value for a Random Forest model:
I computed a simple RF classification model and tried finding the variable importance through the following code:
rf_model$importance
0 1 MeanDecreaseAccuracy MeanDecreaseGini
X1 0.096886458 0.032546101 0.055488009 2472.172207
X2 0.030985037 0.025615202 0.027530078 1338.378297
X3 0.124302743 0.012551971 0.052402188 3091.891586
importance(rf_model)
0 1 MeanDecreaseAccuracy MeanDecreaseGini
X1 159.9149603 175.6265625 242.424683 2472.172207
X2 104.8273654 97.09338154 129.5084398 1338.378297
X3 157.0207876 86.93847182 216.6374153 3091.891586
Why is there a difference between first three columns of the output while the MeanDecreaseGini is same?
When calling importance(rf_model)
by default the measures are divided by their “standard errors”. Consider this example:
library(randomForest)
set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000,
keep.forest=FALSE, importance=TRUE)
mtcars.rf$importance
#output
%IncMSE IncNodePurity
cyl 7.3939431 162.38777
disp 10.0468306 257.46627
hp 7.6801388 200.22729
drat 1.0921653 65.96165
wt 9.7998328 250.94940
qsec 0.6066792 38.52055
vs 0.7048540 24.75183
am 0.6201962 17.27180
gear 0.4110634 16.33811
carb 1.0549523 27.47096
same as above
importance(mtcars.rf, scale = FALSE)
%IncMSE IncNodePurity
cyl 7.3939431 162.38777
disp 10.0468306 257.46627
hp 7.6801388 200.22729
drat 1.0921653 65.96165
wt 9.7998328 250.94940
qsec 0.6066792 38.52055
vs 0.7048540 24.75183
am 0.6201962 17.27180
gear 0.4110634 16.33811
carb 1.0549523 27.47096
default:
importance(mtcars.rf)
%IncMSE IncNodePurity
cyl 15.767986 162.38777
disp 19.885128 257.46627
hp 18.177916 200.22729
drat 7.002942 65.96165
wt 18.479239 250.94940
qsec 5.022593 38.52055
vs 4.427525 24.75183
am 6.435329 17.27180
gear 3.968845 16.33811
carb 8.207903 27.47096
and finally:
importance(mtcars.rf, scale = FALSE)[,1]/mtcars.rf$importanceSD
cyl disp hp drat wt qsec vs am gear carb
15.767986 19.885128 18.177916 7.002942 18.479239 5.022593 4.427525 6.435329 3.968845 8.207903
is same as importance(mtcars.rf)[,1]
all.equal(importance(mtcars.rf, scale = FALSE)[,1]/mtcars.rf$importanceSD,
importance(mtcars.rf)[,1])
#output
TRUE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.