简体   繁体   English

XGBoost决策树选择

[英]XGBoost decision tree selection

I have a question regarding which decision tree should I choose from XGBoost . 我有一个问题,我应该从XGBoost选择XGBoost

I will use the following code as an example. 我将使用以下代码作为示例。

#import packages
import xgboost as xgb
import matplotlib.pyplot as plt

# create DMatrix
df_dmatrix = xgb.DMatrix(data = X, label = y)

# set up parameter dictionary
params = {"objective":"reg:linear", "max_depth":2}

#train the model
xg_reg = xgb.train(params = params, dtrain = df_dmatrix, num_boost_round = 10)

#plot the tree
xgb.plot_tree(xg_reg, num_trees = n) # my question related to here

I create 10 trees in the xg_reg model, and I can plot any one of them by setting n in my last code equal to the index of the tree. 我在xg_reg模型中创建了10棵树,并且可以通过在上一个代码中将n设置为等于树的索引来绘制其中的任何一棵。

My question is: how can I know which tree best explains the dataset? 我的问题是:我怎么知道哪棵树最能解释数据集? Is it always the last one? 它总是最后一个吗? Or should I determine which features I want to include in the tree, and then choose the tree which contains the features? 还是应该确定要包含在树中的要素,然后选择包含要素的树?

My question is how I can know which tree explains the data set best? 我的问题是我如何知道哪棵树最能说明数据集?

XGBoost is an implementation of Gradient Boosted Decision Trees (GBDT). XGBoost是梯度增强决策树(GBDT)的实现。 Roughly speaking, GBDT is a sequence of trees each one improving the prediction of the previous using residual boosting. 粗略地说,GBDT是一棵树序列,每棵树都使用残差增强来改善对前一棵树的预测。 So the tree that explains the data best is the n - 1 th. 因此,最能解释数据的树是n - 1

You can read more about GBDT here 您可以在此处阅读有关GBDT的更多信息

Or should I determine which features I want to include in the tree, and then choose the tree which contains the features? 还是应该确定要包含在树中的要素,然后选择包含要素的树?

All the trees are trained with the same base features, they just get residuals added at every boosting iteration. 所有树都以相同的基本特征进行训练,它们只会在每次增强迭代时添加residuals So you could not determine the best tree in this way. 因此,您无法以这种方式确定最佳树。 In this video there is an intuitive explanation of residuals. 在此视频中,直观地介绍了残差。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM