如何在 Python 中使用 StratifiedKFold 在 LogisticRegression 中進行參數調整？

Question

我需要提供例如 6 個 C 值，並查看每個 C 值的每 10 倍的平均 roc_auc_score

到目前為止我的嘗試：


lr = LogisticRegression(C = 1,
                          penalty='l1', 
                          solver='liblinear',  
                          tol=0.0001, 
                          max_iter=3000, 
                          intercept_scaling=1.0, 
                          multi_class='auto', 
                          random_state=42)


C = [0.01,0.05,0.1,1,10,12]

final_scores = []
mean_scores = {}
# Stratified KFold
skf = StratifiedKFold(n_splits=10, random_state=42, shuffle=False)

for c in C:
    for fold, (train_index, test_index) in enumerate(skf.split(X, y)):
        print("Fold:" , fold +1)
        X_train, X_test = X.iloc[train_index], X.iloc[test_index]
        y_train, y_test = y.iloc[train_index], y.iloc[test_index]
        lr.fit(X_train,y_train)
        predictions = lr.predict_proba(X_train)[:,1]
        final_score.append(roc_auc_score(y_train, predictions))
        print("AUC SCORE:" + str(roc_auc_score(y_train, predictions)))
        mean_scores[c] = np.mean(final_scores)
        print("---")

print(mean_scores)

我需要一個結果字典，因為鍵具有 c 值，而每個 c 的值的平均值為 10 倍。

編輯：


roc_dict = dict()

C = [0.01,0.05,0.1,1,10,12]
for c in C:
    final_scores = []
    mean_scores = {}
    for fold, (train_index, test_index) in enumerate(skf.split(X, y)):
        print("Fold:" , fold +1)
        X_train, X_test = X.iloc[train_index], X.iloc[test_index]
        y_train, y_test = y.iloc[train_index], y.iloc[test_index]
        lr.fit(X_train,y_train)
        predictions = lr.predict_proba(X_train)[:,1]
        final_scores.append(roc_auc_score(y_train, predictions))
        print("AUC SCORE:" + str(roc_auc_score(y_train, predictions)))
    roc_dict[c] = np.mean(final_scores)

Answer 1

您快到了。 您可以在循環之前定義一個空dict ：

roc_dict = dict()

運行您的循環，但將您的list和dict放在里面，以便它重置每次迭代（或創建新的迭代）：

for c in C:
    final_scores = []
    mean_scores = {}
    # no change here, paste your original code
    roc_dict[c] = final_scores # add this

這將導致：

Out[90]: 
{0.01: [0.7194940476190477,
  0.7681686046511628,
  0.653343023255814,
  0.6596194503171249],
 0.05: [0.7194940476190477,
  0.7681686046511628,
  0.653343023255814,
  0.6596194503171249],
 0.1: [0.7194940476190477,
  0.7681686046511628,
  0.653343023255814,
  0.6596194503171249], # ... etc. But with 10 folds instead.

如何在 Python 中使用 StratifiedKFold 在 LogisticRegression 中進行參數調整？

問題描述

1 個解決方案

解決方案1
1 2019-12-16 15:12:47

如何在 Python 中使用 StratifiedKFold 在 LogisticRegression 中進行參數調整？

問題描述

1 個解決方案

解決方案1 1 2019-12-16 15:12:47

解決方案1
1 2019-12-16 15:12:47