简体   繁体   English

使用“ binary:logistic”时,XGBoost中单个树的权重计算

[英]weight calculation of individual tree in XGBoost when using “binary:logistic”

Taking cue from How to access weighting of indiviual decision trees in xgboost? 如何在xgboost中访问单个决策树的权重中获得启示? . How do one calculate the weights when objective = "binary:logistic", and eta = 0.1? 当objective =“ binary:logistic”且eta = 0.1时,如何计算权重?

My tree dump is: 我的树堆是:

booster[0]
0:[WEIGHT<3267.5] yes=1,no=2,missing=1,gain=133.327,cover=58.75
    1:[CYLINDERS<5.5] yes=3,no=4,missing=3,gain=9.61229,cover=33.25
        3:leaf=0.872727,cover=26.5
        4:leaf=0.0967742,cover=6.75
    2:[WEIGHT<3431] yes=5,no=6,missing=5,gain=4.82912,cover=25.5
        5:leaf=-0.0526316,cover=3.75
        6:leaf=-0.846154,cover=21.75
booster[1]
0:[DISPLACEMENT<231.5] yes=1,no=2,missing=1,gain=60.9437,cover=52.0159
    1:[WEIGHT<2974.5] yes=3,no=4,missing=3,gain=6.59775,cover=31.3195
        3:leaf=0.582471,cover=25.5236
        4:leaf=-0,cover=5.79593
    2:[MODELYEAR<78.5] yes=5,no=6,missing=5,gain=1.96045,cover=20.6964
        5:leaf=-0.643141,cover=19.3965
        6:leaf=-0,cover=1.2999

Actually this was practical which I have overseen earlier. 实际上,这是我之前所监督的实用方法。

Using the above tree structure one can find the probability for each training example. 使用上述树结构,可以找到每个训练示例的概率。

The parameter list was: 参数列表为:

param <- list("objective" = "binary:logistic",
              "eval_metric" = "logloss",
              "eta" = 0.5,
              "max_depth" = 2, 
              "colsample_bytree" = .8,
              "subsample" = 0.8,
              "alpha" = 1)

For the instance set in leaf booster[0], leaf: 0-3; 对于在叶子增强器[0]中设置的实例,叶子:0-3; the probability will be exp(0.872727)/(1+exp(0.872727)). 概率将为exp(0.872727)/(1 + exp(0.872727))。

And for booster[0], leaf: 0-3 + booster[1], leaf: 0-3; 对于booster [0],叶子:0-3 + booster [1],叶子:0-3; the probability will be exp(0.872727+ 0.582471)/(1+exp(0.872727+ 0.582471)). 概率为exp(0.872727+ 0.582471)/(1 + exp(0.872727+ 0.582471))。

And so on as one goes on increasing number of iterations. 随着迭代次数的增加,依此类推。

I matched these values with R's predicted probabilities they differ in 10^(-7), probably due to floating point curtailing of leaf quality scores. 我将这些值与R的预测概率进行了匹配,它们的差异在10 ^(-7)之间,这可能是由于降低了叶子质量得分而导致的浮点缩减。

This might not be the answer to the finding weights, but this can give a production level solution when R's trained boosted trees are used in different environment for prediction. 这可能不是找到权重的答案,但是当在不同环境中使用R训练有素的增强树进行预测时,这可以提供生产级别的解决方案。

Any comment on this will be highly appreciated. 任何对此的评论将不胜感激。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM