简体   繁体   English

xgboost分类变量的特征重要性

[英]xgboost feature importance of categorical variable

I am using XGBClassifier to train in python and there are a handful of categorical variables in my training dataset. 我正在使用XGBClassifier在python中进行训练,并且我的训练数据集中有一些分类变量。 Originally, I planed to convert each of them into a few dummies before I throw in my data, but then the feature importance will be calculated for each dummy, not the original categorical ones. 最初,我计划在输入数据之前将它们转换成几个虚拟变量,但是随后将为每个虚拟对象(而不是原始分类对象)计算功能重要性。 Since I also need to order all of my original variables (including numerical + categorical) by importance, I am wondering how to get importance of my original variables? 由于我还需要按重要性对所有原始变量(包括数字+类别)进行排序,因此我想知道如何获得原始变量的重要性? Is it simply adding up? 它只是加起来吗?

You could probably get by with summing the individual categories' importances into their original, parent category. 通过将各个类别的重要性汇总到其原始父类别中,您可能会获得帮助。 But, unless these features are high-cardinality, my two cents would be to report them individually. 但是,除非这些功能具有很高的基数,否则我要花2美分单独报告它们。 I tend to err on the side of being more explicit with reporting model performance/importance measures. 我倾向于在报告模型性能/重要性度量方面更加明确。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM