简体繁体 English

决策树分类器如何与全局约束一起使用？

[英]How can a decision tree classifier work with global constraints?

原文 2019-01-19 11:43:34 8 1 python/ machine-learning/ scikit-learn/ classification/ decision-tree

I generated a decision tree classifier with sklearn in Python, which works well in terms of accuracy. 我使用sklearn在Python中生成了决策树分类器，该分类器在准确性方面效果很好。 I train the classifier with the optimal solution of a linear program, which returns an optimal assignment of items to classes while, considering a global cost-constraint (ie assigning item 1 to class A comes at a cost of x. Total resulting costs over all items and classes need to be smaller than a value y). 我使用线性程序的最优解训练分类器，该最优解将项目的最优分配返回给类，同时考虑到全局成本约束（即，将项目1分配给A类的成本为x。总的总成本）项目和类别必须小于值y）。

After reclassifying all items with the classifier, while the accuracy is acceptable, the global cost-constraint is violated in most classification runs. 在使用分类器对所有项目进行重新分类之后，尽管准确性是可以接受的，但在大多数分类运行中都违反了全局成本约束。 Naturally so, since the standard decision tree from sklearn in python does not consider the constraint. 很自然地，因为python中sklearn的标准决策树没有考虑约束。

Is there a way to incorporate global constraints to be upheld after classification ? 有没有一种方法可以合并分类后要坚持的全局约束 ？ Is there a way to force the tree to consider all already classified items when making the next assignment choice? 在做出下一个分配选择时，是否有一种方法可以迫使树考虑所有已经分类的项目？ I assume this would require to establish some sort of cost- or penalty-function to be checked during classification by the tree. 我认为这将需要建立某种成本或惩罚功能，以便在树分类期间进行检查。

1 个解决方案

Decision trees as implemented in sklearn are built only based on a splitting criteria that considers Gini coefficient, entropy or information gain. sklearn中实现的决策树仅基于考虑了基尼系数，熵或信息增益的分裂标准构建。 Custom loss functions are not possible. 自定义损失功能是不可能的。

However Gradient Boosted Trees, such as XGboost, LightGBM and CatBoost allow to specify your own loss functions. 但是，诸如XGboost，LightGBM和CatBoost之类的梯度增强树允许您指定自己的损失函数。 A tutorial can be found here: https://towardsdatascience.com/custom-loss-functions-for-gradient-boosting-f79c1b40466d 可以在这里找到教程： https : //towardsdatascience.com/custom-loss-functions-for-gradient-boosting-f79c1b40466d

You would then incorporate a penalty term for violating your constraint into the loss function. 然后，您将把违反约束的惩罚条款纳入损失函数。