简体   繁体   English

Python:逻辑回归 max_iter 参数降低了准确性

[英]Python: Logistic regression max_iter parameter is reducing the accuracy

I am doing multiclass/multilabel text classification.我正在做多类/多标签文本分类。 I trying to get rid of the "ConvergenceWarning".我试图摆脱“ConvergenceWarning”。

When I tuned the max_iter from default to 4000 , the warning is disappeared.当我将max_iter从默认值调整为4000 时,警告消失了。 However, my model accuracy is reduced from 78 to 75 .但是,我的模型精度从78降低到75

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


logreg = Pipeline([('vect', CountVectorizer()),
            ('tfidf', TfidfTransformer()),
            ('clf', LogisticRegression(n_jobs=1, C=1e5, solver='lbfgs',multi_class='ovr' ,random_state=0, class_weight='balanced' )),
           ])
logreg.fit(X_train, y_train)


y_pred = logreg.predict(X_test)

print('Logistic Regression Accuracy %s' % accuracy_score(y_pred, y_test))

cv_score = cross_val_score(logreg, train_tfidf, y_train, cv=10, scoring='accuracy')
print("CV Score : Mean : %.7g | Std : %.7g | Min : %.7g | Max : %.7g" % (np.mean(cv_score),np.std(cv_score),np.min(cv_score),np.max(cv_score)))

Why my accuracy is reducing when max_iter =4000?为什么当 max_iter =4000 时我的准确度会降低? Is there any other way to fix * "ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)" *有没有其他方法可以解决*“ConvergenceWarning: lbfgs 未能收敛。增加迭代次数。“迭代次数。”,ConvergenceWarning)” *

It's missing the data used in the question so it's not possible to reproduce the problem but just guess.它缺少问题中使用的数据,因此无法重现问题,只能猜测。

Some things to check:要检查的一些事项:

1) Many estimators such as LogisticRegression likes (not to say requires) scaled data. 1)许多估计器,例如LogisticRegression喜欢(不是说需要)缩放数据。 Depending on your data, you may want to scale with MaxAbsScaler , MinMaxScaler , StandardScaler or RobustAScaler .根据您的数据,您可能希望使用MaxAbsScalerMinMaxScalerStandardScalerRobustAScaler The optimal choice depends on the kind of problem you are trying to solve, data properties like sparsity, whether negative values are welcomed by the downstream estimator, etc. Scaling data usually speeds up convergence, that may even not require to increase max_iter .最佳选择取决于您要解决的问题类型、稀疏性等数据属性、下游估计器是否欢迎负值等。缩放数据通常会加快收敛速度​​,甚至可能不需要增加max_iter

2) In my experience, solver not "liblinear" requires more max_iter iterations to converge given the same input data. 2)根据我的经验, solver不是"liblinear"需要更多的max_iter迭代来收敛给定相同的输入数据。

3) I didn't see any 'max_iter set in your code snippet. It currently defaults to 3) 我set in your code snippet. It currently defaults to没有看到任何 'max_iter set in your code snippet. It currently defaults to set in your code snippet. It currently defaults to 100` (sklearn 0.22). set in your code snippet. It currently defaults to 100`(sklearn 0.22)。

4) I saw you set the the regularization parameter C=100000 . 4)我看到你设置了正则化参数C=100000 It's drastically reduce the regularization, as C is the inverse of regularization strength.它大大减少了正则化,因为 C 是正则化强度的倒数。 It's expected to consume more iterations and may lead to overfit the model.预计会消耗更多迭代,并可能导致模型过度拟合。

5) I didn't expect that a higher max_iter would get you lower accuracy. 5) 我没想到更高的max_iter会降低准确度。 The solver is diverging rather than converging.求解器正在发散而不是收敛。 The data may not be scaled or the random state is not fixed or the tolerance tol (defaults 1e-4) became to high.数据可能未缩放或随机状态不固定或容差tol (默认值 1e-4)变得很高。

6) Check you cross_val_score cross-validation parameter cv . 6) 检查您的cross_val_score交叉验证参数cv If I'm not wrong, the default behavior doesn't set the random state which result in variable mean accuracy.如果我没有错,默认行为不会设置随机状态,从而导致可变平均准确度。

In my case, I increased the max_iter by small increments (from default 100 to 400 first and then intervals of 400) till I got rid of the warning.在我的例子中,我以小的增量(从默认的 100 到 400,然后是 400 的间隔)增加了max_iter ,直到我摆脱了警告。 And, interestingly it increased the model performance parameters (Accuracy, Precision, Recall, F1 Score).而且,有趣的是,它增加了模型性能参数(准确度、精度、召回率、F1 分数)。 Intuitively that makes sense as now the convergence happens and you reach the optimal solution vs. in the earlier case you weren't.直觉上这是有道理的,因为现在收敛发生了,你达到了最佳解决方案,而在早期的情况下你没有。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何更改 sklearn 高斯过程回归使用的优化 function 中的 max_iter? - How to change max_iter in optimize function used by sklearn gaussian process regression? svm和logistic回归在python中的准确性差异 - accuracy difference between svm and logistic regression in python Scikit-learn,KMeans:如何使用max_iter - Scikit-learn, KMeans: How to use max_iter scikit-learn 中的神经网络 epoch 和 max_iter - Neural Network epoch and max_iter in scikit-learn Lasso Regularizer sklearn 中的 max_iter 和 tol 是什么 - what is max_iter and tol in Lasso Regularizer sklearn 如何提高Scikit python中逻辑回归的模型精度? - How to increase the model accuracy of logistic regression in Scikit python? 在Python中使用Logistic回归的预测矢量的准确性得分 - Accuracy Score for a vector of predictions using Logistic Regression in Python 如何计算逻辑回归精度 - How to calculate logistic regression accuracy 使用Python,使我的逻辑回归测试精度更接近我的训练准确度 - Making my logistic regression testing accuracy closer to my training accuracy with Python early_stopping 设置为 False,但迭代在 Sklearn MLPClassifier 中的 max_iter 之前停止 - early_stopping set to False, but iteration stops before max_iter in Sklearn MLPClassifier
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM