randomForest 包的函數重要性

Question

我想使用隨機森林來為分類問題找到最重要的特征（我有兩個類：0 和 1）。

我創建了模型：

rf = randomForest(y  ~ ., data = df, sampsize=100000,ntree=100, importance = TRUE, keep.forest = FALSE)

然后我使用以下內容來檢查重要性：

importance(rf, type = 1, class = 1)

我讀到 class 參數可用於分類問題。 我的問題是我是否必須按平均降低精度中的絕對值對結果進行排序。 當我使用VarImpPlot似乎我也應該考慮負值。 參數class = 1究竟是什么？

Answer 1

我們可以使用 iris 數據集，它有 3 個物種：

數據（虹膜）表（虹膜$物種）

setosa versicolor  virginica 
    50         50         50

我們擬合一個隨機森林：

library(randomForest)
mdl = randomForest(Species~.,data=iris,importance=TRUE)
# let's do it without options
importance(mdl)
                setosa versicolor virginica MeanDecreaseAccuracy
Sepal.Length  6.364533  6.2112640  7.632076            10.365371
Sepal.Width   4.790211  0.4339124  5.500338             5.153676
Petal.Length 22.027701 34.5777755 29.080648            35.215194
Petal.Width  22.500729 31.1403378 30.714576            33.335003
             MeanDecreaseGini
Sepal.Length         9.223319
Sepal.Width          2.189763
Petal.Length        44.703684
Petal.Width         43.163546

上表是您的所有結果，如果您執行 important(mdl,type=1) 操作，則該變量的所有類別的平均准確度都會降低。 對於可以預測的每個類（setosa、versicolor、virginica），您會看到三個單獨的列，因此如果您這樣做：

importance(mdl,type=1,class="setosa")
                setosa
Sepal.Length  6.364533
Sepal.Width   4.790211
Petal.Length 22.027701
Petal.Width  22.500729

您可以更改與此類相關的准確性。

因此，在您的代碼中，當您執行importance(rf, type = 1, class = 1)並且您的模型是randomForest(y ~ ., data = df... ) ，您試圖找到變量的重要性，與在 y 中具有標簽 1 的預測相關聯。

最后，您可以對它們進行排序：

res = importance(mdl,type=1,class="setosa")
res = res[order(res[,1],decreasing=TRUE),drop=FALSE,]
res

randomForest 包的函數重要性

問題描述

1 個解決方案

解決方案1
0 2020-01-26 22:25:53

randomForest 包的函數重要性

問題描述

1 個解決方案

解決方案1 0 2020-01-26 22:25:53

解決方案1
0 2020-01-26 22:25:53