Python：逻辑回归 max_iter 参数降低了准确性

Question

I am doing multiclass/multilabel text classification.我正在做多类/多标签文本分类。 I trying to get rid of the "ConvergenceWarning".我试图摆脱“ConvergenceWarning”。

When I tuned the max_iter from default to 4000 , the warning is disappeared.当我将max_iter从默认值调整为4000 时，警告消失了。 However, my model accuracy is reduced from 78 to 75 .但是，我的模型精度从78降低到75 。

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


logreg = Pipeline([('vect', CountVectorizer()),
            ('tfidf', TfidfTransformer()),
            ('clf', LogisticRegression(n_jobs=1, C=1e5, solver='lbfgs',multi_class='ovr' ,random_state=0, class_weight='balanced' )),
           ])
logreg.fit(X_train, y_train)


y_pred = logreg.predict(X_test)

print('Logistic Regression Accuracy %s' % accuracy_score(y_pred, y_test))

cv_score = cross_val_score(logreg, train_tfidf, y_train, cv=10, scoring='accuracy')
print("CV Score : Mean : %.7g | Std : %.7g | Min : %.7g | Max : %.7g" % (np.mean(cv_score),np.std(cv_score),np.min(cv_score),np.max(cv_score)))

Why my accuracy is reducing when max_iter =4000?为什么当 max_iter =4000 时我的准确度会降低？ Is there any other way to fix * "ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)" *有没有其他方法可以解决*“ConvergenceWarning: lbfgs 未能收敛。增加迭代次数。“迭代次数。”，ConvergenceWarning)” *

Answer 1

It's missing the data used in the question so it's not possible to reproduce the problem but just guess.它缺少问题中使用的数据，因此无法重现问题，只能猜测。

Some things to check:要检查的一些事项：

1) Many estimators such as LogisticRegression likes (not to say requires) scaled data. 1）许多估计器，例如LogisticRegression喜欢（不是说需要）缩放数据。 Depending on your data, you may want to scale with MaxAbsScaler , MinMaxScaler , StandardScaler or RobustAScaler .根据您的数据，您可能希望使用MaxAbsScaler 、 MinMaxScaler 、 StandardScaler或RobustAScaler 。 The optimal choice depends on the kind of problem you are trying to solve, data properties like sparsity, whether negative values are welcomed by the downstream estimator, etc. Scaling data usually speeds up convergence, that may even not require to increase max_iter .最佳选择取决于您要解决的问题类型、稀疏性等数据属性、下游估计器是否欢迎负值等。缩放数据通常会加快收敛速度，甚至可能不需要增加max_iter 。

2) In my experience, solver not "liblinear" requires more max_iter iterations to converge given the same input data. 2）根据我的经验， solver不是"liblinear"需要更多的max_iter迭代来收敛给定相同的输入数据。

3) I didn't see any 'max_iter set in your code snippet. It currently defaults to 3) 我set in your code snippet. It currently defaults to没有看到任何 'max_iter set in your code snippet. It currently defaults to set in your code snippet. It currently defaults to 100` (sklearn 0.22). set in your code snippet. It currently defaults to 100`（sklearn 0.22）。

4) I saw you set the the regularization parameter C=100000 . 4）我看到你设置了正则化参数C=100000 。 It's drastically reduce the regularization, as C is the inverse of regularization strength.它大大减少了正则化，因为 C 是正则化强度的倒数。 It's expected to consume more iterations and may lead to overfit the model.预计会消耗更多迭代，并可能导致模型过度拟合。

5) I didn't expect that a higher max_iter would get you lower accuracy. 5) 我没想到更高的max_iter会降低准确度。 The solver is diverging rather than converging.求解器正在发散而不是收敛。 The data may not be scaled or the random state is not fixed or the tolerance tol (defaults 1e-4) became to high.数据可能未缩放或随机状态不固定或容差tol （默认值 1e-4）变得很高。

6) Check you cross_val_score cross-validation parameter cv . 6) 检查您的cross_val_score交叉验证参数cv 。 If I'm not wrong, the default behavior doesn't set the random state which result in variable mean accuracy.如果我没有错，默认行为不会设置随机状态，从而导致可变平均准确度。

Answer 2

In my case, I increased the max_iter by small increments (from default 100 to 400 first and then intervals of 400) till I got rid of the warning.在我的例子中，我以小的增量（从默认的 100 到 400，然后是 400 的间隔）增加了max_iter ，直到我摆脱了警告。 And, interestingly it increased the model performance parameters (Accuracy, Precision, Recall, F1 Score).而且，有趣的是，它增加了模型性能参数（准确度、精度、召回率、F1 分数）。 Intuitively that makes sense as now the convergence happens and you reach the optimal solution vs. in the earlier case you weren't.直觉上这是有道理的，因为现在收敛发生了，你达到了最佳解决方案，而在早期的情况下你没有。

Python：逻辑回归 max_iter 参数降低了准确性

问题描述

2 个解决方案

解决方案1
2 2020-01-05 19:36:59

解决方案2
0 2021-10-30 03:45:01

Python：逻辑回归 max_iter 参数降低了准确性

问题描述

2 个解决方案

解决方案1 2 2020-01-05 19:36:59

解决方案2 0 2021-10-30 03:45:01

解决方案1
2 2020-01-05 19:36:59

解决方案2
0 2021-10-30 03:45:01