f1 分數和混淆矩陣之間是否存在相關性導致梯度增強決策樹模型 (XGBoost)？

Question

我正在根據“給我一些榮譽”Kaggle 競賽 ( https://www.kaggle.com/competitions/GiveMeSomeCredit/overview ) 中的數據構建決策樹模型。 我正在嘗試在比賽的訓練數據集上訓練這個模型，然后將其應用於我自己的數據集進行研究。

我面臨的問題是，看起來我的模型得到的 f1 分數與混淆矩陣呈現的結果不相關，f1 分數越高，標簽預測就越差。 目前我最大化 f1 的最佳參數如下（包括我測量分數的方式）：

from sklearn.model_selection import RandomizedSearchCV
import xgboost

classifier=xgboost.XGBClassifier(tree_method='gpu_hist', booster='gbtree', importance_type='gain')

params={
    "colsample_bytree":[0.3], 
    "gamma":[0.3],
    "learning_rate":[0.1], 
    "max_delta_step":[1], 
    "max_depth":[4],
    "min_child_weight":[9],
    "n_estimators":[150], 
    "num_parallel_tree":[1], 
    "random_state":[0],
    "reg_alpha":[0], 
    "reg_lambda":[0], 
    "scale_pos_weight":[4],
    "validate_parameters":[1],
    "n_jobs":[-1],
    "subsample":[1],
    }

clf=RandomizedSearchCV(classifier,param_distributions=params,n_iter=100,scoring='f1',cv=10,verbose=3)
clf.fit(X,y)

這些參數給我的 f1 分數約為 0.46。 然而，當這個模型被輸出到一個混淆矩陣上時，標簽“1”的標簽預測准確率只有 50%（下圖）。

當嘗試調整參數以實現更好的標簽預測時，我可以將兩個標簽的標簽預測准確率提高到 97%，但這會將 f1 分數降低到大約 0.3。 這是我用於創建混淆矩陣的代碼（包含的參數是 f1 得分為 0.3 的參數）：

from xgboost import XGBClassifier
from numpy import nan
final_model = XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,
              colsample_bylevel=1, colsample_bynode=1, colsample_bytree=0.7,
              early_stopping_rounds=None, enable_categorical=False,
              eval_metric=None, gamma=0.2, gpu_id=0, grow_policy='depthwise',
              importance_type='gain', interaction_constraints='',
              learning_rate=1.5, max_bin=256, max_cat_to_onehot=4,
              max_delta_step=0, max_depth=5, max_leaves=0, min_child_weight=9,
              missing=nan, monotone_constraints='()', n_estimators=800,
              n_jobs=-1, num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=5)

final_model.fit(X,y)

pred_xgboost = final_model.predict(X)

cm = confusion_matrix(y, pred_xgboost)
cm_norm = cm/cm.sum(axis=1)[:, np.newaxis]
plt.figure()
fig, ax = plt.subplots(figsize=(10, 10))
plot_confusion_matrix(cm_norm, classes=rf.classes_)

以下是這些參數的混淆矩陣：

我不明白為什么這兩個指標（f1 分數和混淆矩陣准確性）之間似乎沒有相關性，也許不同的評分系統會更有用？

Answer 1

你能顯示絕對值嗎？ 從技術上講， cm_norm = cm/cm.sum(axis=1)[:, np.newaxis]代表召回率，而不是准確率。 您可以輕松獲得具有良好召回率但正類精度較差的矩陣（例如 [[9000, 300], [1, 30]]） - 您可以使用與axis=0相同的代碼檢查您的精度。 （F1 是正類召回率和准確率的調和平均值。）

如果您希望針對 F1 進行優化，您還應該在sklearn.metrics.precision_recall_curve()上尋找最佳分類閾值。

Answer 2

有關系，雖然不是很明顯。 如果您生成分類報告，將有助於更好地理解它。

此外，較高的 max_rate 可以更改召回特異性的值，這會影響分類報告中的類 f1_score 之一，但不會影響從 f1_score(y_valid, predictions) 派生的 f1-score。 過采樣也會影響召回率。

from sklearn.metrics import classification_report
ClassificationReport = classification_report(y_valid,predictions.round(),output_dict=True)

f1_score 是准確率和召回率之間的平衡。 混淆矩陣顯示了兩個類的精度值。 通過分類報告，我可以看到關系，如下例所示。

Classification Report
    precision   recall      f1-score    support
0   0.722292    0.922951    0.810385    23167.0
1   0.982273    0.923263    0.951854    107132.0


Confusion Matrix using Validation Data (y_valid)

True Negative  : CHGOFF (0) was predicted 21382 times correctly (72.23 %)
False Negative : CHGOFF (0) was predicted 8221 times incorrectly (27.77 %)
True Positive  : P I F (1) was predicted 98911 times correctly (98.23 %)
False Positive : P I F (1) was predicted 1785 times incorrectly (1.77 %)

f1 分數和混淆矩陣之間是否存在相關性導致梯度增強決策樹模型 (XGBoost)？

問題描述

2 個解決方案

解決方案1
1 2022-06-10 14:53:22

解決方案2
1 2022-06-11 20:10:33

f1 分數和混淆矩陣之間是否存在相關性導致梯度增強決策樹模型 (XGBoost)？

問題描述

2 個解決方案

解決方案1 1 2022-06-10 14:53:22

解決方案2 1 2022-06-11 20:10:33

解決方案1
1 2022-06-10 14:53:22

解決方案2
1 2022-06-11 20:10:33