简体   繁体   English

如何在 Python 中使用 StratifiedKFold 在 LogisticRegression 中进行参数调整?

[英]How to do parameter tuning in LogisticRegression using StratifiedKFold in Python?

I need to feed for example 6 C values and see the mean roc_auc_score for each 10 fold for each value of C我需要提供例如 6 个 C 值,并查看每个 C 值的每 10 倍的平均 roc_auc_score

My attempt so far:到目前为止我的尝试:


lr = LogisticRegression(C = 1,
                          penalty='l1', 
                          solver='liblinear',  
                          tol=0.0001, 
                          max_iter=3000, 
                          intercept_scaling=1.0, 
                          multi_class='auto', 
                          random_state=42)


C = [0.01,0.05,0.1,1,10,12]

final_scores = []
mean_scores = {}
# Stratified KFold
skf = StratifiedKFold(n_splits=10, random_state=42, shuffle=False)

for c in C:
    for fold, (train_index, test_index) in enumerate(skf.split(X, y)):
        print("Fold:" , fold +1)
        X_train, X_test = X.iloc[train_index], X.iloc[test_index]
        y_train, y_test = y.iloc[train_index], y.iloc[test_index]
        lr.fit(X_train,y_train)
        predictions = lr.predict_proba(X_train)[:,1]
        final_score.append(roc_auc_score(y_train, predictions))
        print("AUC SCORE:" + str(roc_auc_score(y_train, predictions)))
        mean_scores[c] = np.mean(final_scores)
        print("---")

print(mean_scores)

I need a resulting dictionary that as keys have c values and values have the mean of 10 fold for each c.我需要一个结果字典,因为键具有 c 值,而每个 c 的值的平均值为 10 倍。

Edit:编辑:


roc_dict = dict()

C = [0.01,0.05,0.1,1,10,12]
for c in C:
    final_scores = []
    mean_scores = {}
    for fold, (train_index, test_index) in enumerate(skf.split(X, y)):
        print("Fold:" , fold +1)
        X_train, X_test = X.iloc[train_index], X.iloc[test_index]
        y_train, y_test = y.iloc[train_index], y.iloc[test_index]
        lr.fit(X_train,y_train)
        predictions = lr.predict_proba(X_train)[:,1]
        final_scores.append(roc_auc_score(y_train, predictions))
        print("AUC SCORE:" + str(roc_auc_score(y_train, predictions)))
    roc_dict[c] = np.mean(final_scores)

You're almost there.您快到了。 You can define an empty dict before your loop:您可以在循环之前定义一个空dict

roc_dict = dict()

Run your loop, but place your list and dict inside so it resets every iteration (or make new ones):运行您的循环,但将您的listdict放在里面,以便它重置每次迭代(或创建新的迭代):

for c in C:
    final_scores = []
    mean_scores = {}
    # no change here, paste your original code
    roc_dict[c] = final_scores # add this

It will result in this:这将导致:

Out[90]: 
{0.01: [0.7194940476190477,
  0.7681686046511628,
  0.653343023255814,
  0.6596194503171249],
 0.05: [0.7194940476190477,
  0.7681686046511628,
  0.653343023255814,
  0.6596194503171249],
 0.1: [0.7194940476190477,
  0.7681686046511628,
  0.653343023255814,
  0.6596194503171249], # ... etc. But with 10 folds instead.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用for循环调整进入层的参数? - how to tuning parameter entering the layer using for loop? 使用LogisticRegression()将R的GLMNET输出与Python进行比较 - Comparing the GLMNET output of R with Python using LogisticRegression() 如何对庞大的数据集进行交叉验证和超参数调整? - how to do cross validation and hyper parameter tuning for huge dataset? 如何使用Sklearn的管道进行参数调整/交叉验证? - How do to parameter tuning/cross-validation with Sklearn's pipeline? 如何在scikit学习中使用KFold而不是StratifiedKFold进行RFECV? - How to do RFECV in scikit-learn with KFold, not StratifiedKFold? python中的StratifiedKFold给出了错误 - StratifiedKFold in python giving error 如何在 python 中对 PMML 模型进行超参数调整? - How can I do HyperParameter Tuning for PMML model in python? LogisticRegression 不返回任何结果 (Python) - LogisticRegression returns no results (Python) 带有参数调整的 Python 包装器围绕快速文本训练 - Python wrapper arround fasttext train with parameter tuning LogisticRegression:未知标签类型:在 python 中使用 sklearn 的“连续” - LogisticRegression: Unknown label type: 'continuous' using sklearn in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM