简体   繁体   English

具有(基于 AUC 的)早停轮次的 LightGBM 二元分类器是以对数损失为目标 function,还是在优化算法中使用 AUC?

[英]Does LightGBM binary classifier with (AUC-based) early-stopping rounds take log loss as objective function, or use AUC in optimization algorithm?

I have a LightGBM gradient boosting model for a binary classification task.我有一个用于二进制分类任务的 LightGBM 梯度提升 model。 The LightGBM parameters specification state that the objective function of binary classification is the log loss (cross entropy). LightGBM 参数说明state 二分类的目标 function 是对数损失(交叉熵)。 Hence, I understand that is the objective function the model uses in its optimization.因此,我知道这是 function model 在其优化中使用的目标。

However, I have set up the model to stop training if the AUC on the validation data doesn't improve after 10 rounds.但是,我已将 model 设置为在验证数据的 AUC 在 10 轮后没有改善时停止训练。 Now, has the objective function of the algorithm changed to one based on the AUC, or did it remain as before, and merely calculating the AUCs in parallel on the validation set to check if scores don't improve after 10 rounds?现在,算法的目标 function 是否已更改为基于 AUC 的目标,还是保持原样,仅在验证集上并行计算 AUC 以检查分数是否在 10 轮后没有提高?

For reference, my code below:作为参考,我的代码如下:

model = lightgbm.LGBMClassifier(n_estimators=500, learning_rate=0.01)
fit_params={"early_stopping_rounds":30,
                "eval_metric" : 'auc',
                "eval_set" : [(X_test,y_test)],
                'eval_names': ['valid'],
                #'callbacks': [lgb.reset_parameter(learning_rate=learning_rate_010_decay_power_099)],
                'verbose': 100,
                'categorical_feature': 'auto'}

params = { 'boosting': ['gbdt'],
        'objective': ['binary'],
        'num_leaves': sp_randint(20, 63), # adjust to inc AUC
        'max_depth': sp_randint(3, 8),
        'min_child_samples': sp_randint(100, 500),
        'min_child_weight': [1e-5, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4],
        'subsample': sp_uniform(loc=0.2, scale=0.8),
        'colsample_bytree': sp_uniform(loc=0.4, scale=0.6),
        'reg_alpha': [0, 1e-1, 1, 2, 5, 7, 10, 50, 100],
        'reg_lambda': [0, 1e-1, 1, 5, 10, 20, 50, 100],
        'is_unbalance': ['true'],
        'feature_fraction': np.arange(0.3, 0.6, 0.05), # model trained faster AND prevents overfitting
        'bagging_fraction': np.arange(0.3, 0.6, 0.05), # similar benefits as above
        'bagging_freq': np.arange(10, 30, 5),  # after every 20 iterations, lgb will randomly select 50% of observations and use it for next 20 iterations
        }
RSCV = RandomizedSearchCV(model, params, scoring= 'roc_auc', cv=3, n_iter=10, refit=True, random_state=42, verbose=True)
RSCV.fit(X_train, y_train, **fit_params)

The objective remains log-loss.目标仍然是对数损失。

I don't have a good way to demonstrate this.我没有很好的方法来证明这一点。 But AUC cannot easily be used as an objective function, since it only care about the ordering of the predictions;但 AUC 不能轻易用作目标 function,因为它只关心预测的顺序; it wouldn't by itself be able to push predictions toward particular values.它本身无法将预测推向特定值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM