简体   繁体   English

多类中的变量重要性

[英]variable importance in multiclass

I have a dataset like iris and my y is a multi-class factor variable.我有一个像 iris 这样的数据集,我的 y 是一个多类因子变量。 Is there any way to see the same results for method = rf , method = treebag , and method = boost many thanks in advance.有什么方法可以看到method = rfmethod = treebagmethod = boost的相同结果,非常感谢。

  data(iris); head(iris)
iris$Species <- factor(iris$Species)

set.seed(87)
inTrainingSet <- createDataPartition(iris$Species, p=.80,list=0)
train <- iris[inTrainingSet,]
test  <- iris[-inTrainingSet,]
ctrl <- trainControl(method = "cv", number = 2,  verboseIter = TRUE)


pls <- train(Species ~ Sepal.Length+Sepal.Width+Petal.Length+Petal.Width , 
              method = "pls", data = iris,
              trControl = ctrl)
attributes(varImp(pls))
varImp(pls)$importance

There's a few points to your question, so if there is a built in method to estimate this properly for each model, you can run varImp with the default useModel = FALSE .你的问题有几点,所以如果有一个内置的方法来为每个模型正确估计这个,你可以使用默认的useModel = FALSE运行 varImp 。

For randomforest, you add importance=TRUE while fitting:对于随机森林,您在拟合时添加importance=TRUE

rf <- train(Species ~ Sepal.Length+Sepal.Width+Petal.Length+Petal.Width , 
              method = "rf", data = iris,
              trControl = ctrl,importance=TRUE)
varImp(rf)

rf variable importance

  variables are sorted by maximum importance across the classes
             setosa versicolor virginica
Petal.Length  66.94     100.00     85.40
Petal.Width   63.86      92.22     89.87
Sepal.Length  16.75      24.05     24.90
Sepal.Width   12.75       0.00     17.49

If the model does not have an inbuilt for multiclass, then the pairwise roc curve is used to derive these importances, see page for caret on the specificities on this:如果模型没有内置的多类,则使用成对 roc 曲线来推导这些重要性,请参阅有关此特殊性的脱字符页

tb <- train(Species ~ Sepal.Length+Sepal.Width+Petal.Length+Petal.Width , 
                  method = "treebag", data = iris,
                  trControl = ctrl,importance=TRUE)

varImp(tb,useModel=TRUE)
treebag variable importance

             Overall
Petal.Length  100.00
Petal.Width    99.17
Sepal.Length   32.23
Sepal.Width     0.00

 varImp(tb,useModel=FALSE)
ROC curve variable importance

  variables are sorted by maximum importance across the classes
             setosa versicolor virginica
Petal.Width  100.00     100.00     100.0
Petal.Length 100.00     100.00     100.0
Sepal.Length  90.70      59.30      90.7
Sepal.Width   54.59      54.59       0.0

You did not specify which boosted tree method use, but I guess you can easily use one of the options above您没有指定使用哪种提升树方法,但我想您可以轻松使用上述选项之一

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM