ConvergenceWarning: Liblinear收敛失败，增加迭代次数

Question

Running the code of linear binary pattern for Adrian.为 Adrian 运行线性二进制模式的代码。 This program runs but gives the following warning:该程序运行但给出以下警告：

C:\Python27\lib\site-packages\sklearn\svm\base.py:922: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
 "the number of iterations.", ConvergenceWarning

I am running python2.7 with opencv3.7, what should I do?我正在用 opencv3.7 运行 python2.7，我该怎么办？

Answer 1

Normally when an optimization algorithm does not converge, it is usually because the problem is not well-conditioned, perhaps due to a poor scaling of the decision variables.通常，当优化算法不收敛时，通常是因为问题条件不佳，可能是由于决策变量的缩放不当。 There are a few things you can try.您可以尝试一些方法。

Normalize your training data so that the problem hopefully becomes more well conditioned, which in turn can speed up convergence.规范化您的训练数据，以便问题有望变得更好，从而可以加快收敛速度。 One possibility is to scale your data to 0 mean, unit standard deviation using Scikit-Learn's StandardScaler for an example.一种可能性是使用Scikit-Learn 的 StandardScaler示例将数据缩放到 0 均值、单位标准偏差。 Note that you have to apply the StandardScaler fitted on the training data to the test data.请注意，您必须将在训练数据上拟合的 StandardScaler 应用于测试数据。 Also, if you have discrete features, make sure they are transformed properly so that scaling them makes sense.此外，如果您有离散特征，请确保它们被正确转换，以便缩放它们有意义。
Related to 1), make sure the other arguments such as regularization weight, C , is set appropriately.与 1) 相关，确保其他参数（例如正则化权重C ）设置适当。 C has to be > 0. Typically one would try various values of C in a logarithmic scale (1e-5, 1e-4, 1e-3, ..., 1, 10, 100, ...) before finetuning it at finer granularity within a particular interval. C必须 > 0。通常，人们会在对数刻度（1e-5, 1e-4, 1e-3, ..., 1, 10, 100, ...）中尝试各种C值，然后再对其进行微调特定间隔内的更细粒度。 These days, it probably make more sense to tune parameters using, for eg, Bayesian Optimization using a package such as Scikit-Optimize .如今，使用诸如Scikit-Optimize 之类的包进行贝叶斯优化来调整参数可能更有意义。
Set max_iter to a larger value.将max_iter设置为更大的值。 The default is 1000. This should be your last resort.默认值为 1000。这应该是您的最后手段。 If the optimization process does not converge within the first 1000 iterations, having it converge by setting a larger max_iter typically masks other problems such as those described in 1) and 2).如果优化过程在前 1000 次迭代中没有收敛，则通过设置更大的max_iter使其收敛通常会掩盖其他问题，例如 1) 和 2) 中描述的问题。 It might even indicate that you have some in appropriate features or strong correlations in the features.它甚至可能表明您有一些适当的特征或特征中有很强的相关性。 Debug those first before taking this easy way out.在采取这种简单的方法之前先调试那些。
Set dual = True if number of features > number of examples and vice versa.如果特征数 > 示例数，则设置dual = True ，反之亦然。 This solves the SVM optimization problem using the dual formulation.这使用对偶公式解决了 SVM 优化问题。 Thanks @Nino van Hooff for pointing this out, and @JamesKo for spotting my mistake.感谢@Nino van Hooff指出这一点，感谢@JamesKo发现我的错误。
Use a different solver, for eg, the L-BFGS solver if you are using Logistic Regression.如果您使用的是 Logistic 回归，请使用不同的求解器，例如 L-BFGS 求解器。 See @5ervant 's answer.请参阅@5ervant的回答。

Note: One should not ignore this warning.注意：不应忽略此警告。

This warning came about because这个警告是因为

Solving the linear SVM is just solving a quadratic optimization problem.求解线性 SVM 只是求解二次优化问题。 The solver is typically an iterative algorithm that keeps a running estimate of the solution (ie, the weight and bias for the SVM).求解器通常是一种迭代算法，它保持对解的运行估计（即 SVM 的权重和偏差）。 It stops running when the solution corresponds to an objective value that is optimal for this convex optimization problem, or when it hits the maximum number of iterations set.当解对应于该凸优化问题的最佳目标值时，或者当它达到最大迭代次数集时，它就会停止运行。
If the algorithm does not converge, then the current estimate of the SVM's parameters are not guaranteed to be any good, hence the predictions can also be complete garbage.如果算法不收敛，则不能保证 SVM 参数的当前估计是好的，因此预测也可能是完全垃圾。

Edit编辑

In addition, consider the comment by @Nino van Hooff and @5ervant to use the dual formulation of the SVM.此外，请考虑@Nino van Hooff和@5ervant的评论，以使用 SVM 的双重公式。 This is especially important if the number of features you have, D, is more than the number of training examples N. This is what the dual formulation of the SVM is particular designed for and helps with the conditioning of the optimization problem.如果您拥有的特征数量 D 多于训练示例的数量 N，这一点尤其重要。这就是 SVM 的对偶公式是专门为优化问题的条件而设计的。 Credit to @5ervant for noticing and pointing this out.感谢@5ervant注意到并指出了这一点。

Furthermore, @5ervant also pointed out the possibility of changing the solver, in particular the use of the L-BFGS solver.此外， @5ervant还指出了更改求解器的可能性，尤其是使用 L-BFGS 求解器。 Credit to him (ie, upvote his answer, not mine).归功于他（即，赞成他的回答，而不是我的）。

I would like to provide a quick rough explanation for those who are interested (I am :)) why this matters in this case.我想为那些感兴趣的人提供一个快速粗略的解释（我是:)）为什么这在这种情况下很重要。 Second-order methods, and in particular approximate second-order method like the L-BFGS solver, will help with ill-conditioned problems because it is approximating the Hessian at each iteration and using it to scale the gradient direction.二阶方法，特别是近似二阶方法，如 L-BFGS 求解器，将有助于解决病态问题，因为它在每次迭代时逼近 Hessian 并使用它来缩放梯度方向。 This allows it to get better convergence rate but possibly at a higher compute cost per iteration.这允许它获得更好的收敛速度，但每次迭代的计算成本可能更高。 That is, it takes fewer iterations to finish but each iteration will be slower than a typical first-order method like gradient-descent or its variants.也就是说，完成所需的迭代次数更少，但每次迭代都会比典型的一阶方法（如梯度下降或其变体）慢。

For eg, a typical first-order method might update the solution at each iteration like例如，典型的一阶方法可能会在每次迭代时更新解决方案，例如

x(k + 1) = x(k) - alpha(k) * gradient(f(x(k))) x(k + 1) = x(k) - alpha(k) * 梯度(f(x(k)))

where alpha(k), the step size at iteration k, depends on the particular choice of algorithm or learning rate schedule.其中 alpha(k)，迭代 k 的步长，取决于算法或学习率计划的特定选择。

A second order method, for eg, Newton, will have an update equation二阶方法，例如牛顿，将有一个更新方程

x(k + 1) = x(k) - alpha(k) * Hessian(x(k))^(-1) * gradient(f(x(k))) x(k + 1) = x(k) - alpha(k) * Hessian(x(k))^(-1) * gradient(f(x(k)))

That is, it uses the information of the local curvature encoded in the Hessian to scale the gradient accordingly.也就是说，它使用 Hessian 中编码的局部曲率信息来相应地缩放梯度。 If the problem is ill-conditioned, the gradient will be pointing in less than ideal directions and the inverse Hessian scaling will help correct this.如果问题是病态的，梯度将指向不太理想的方向，而反向 Hessian 缩放将有助于纠正这一点。

In particular, L-BFGS mentioned in @5ervant 's answer is a way to approximate the inverse of the Hessian as computing it can be an expensive operation.特别是， @5ervant的答案中提到的 L-BFGS 是一种近似 Hessian 逆的方法，因为计算它可能是一项昂贵的操作。

However, second-order methods might converge much faster (ie, requires fewer iterations) than first-order methods like the usual gradient-descent based solvers, which as you guys know by now sometimes fail to even converge.然而，二阶方法的收敛速度可能比一阶方法快得多（即需要更少的迭代），比如通常的基于梯度下降的求解器，正如你们现在所知，有时甚至无法收敛。 This can compensate for the time spent at each iteration.这可以补偿每次迭代所花费的时间。

In summary, if you have a well-conditioned problem, or if you can make it well-conditioned through other means such as using regularization and/or feature scaling and/or making sure you have more examples than features, you probably don't have to use a second-order method.总而言之，如果您有一个条件良好的问题，或者如果您可以通过其他方式（例如使用正则化和/或特征缩放和/或确保示例多于特征）使其条件良好，您可能不会必须使用二阶方法。 But these days with many models optimizing non-convex problems (eg, those in DL models), second order methods such as L-BFGS methods plays a different role there and there are evidence to suggest they can sometimes find better solutions compared to first-order methods.但是如今，随着许多模型优化非凸问题（例如，DL 模型中的模型），L-BFGS 方法等二阶方法在那里扮演着不同的角色，并且有证据表明，与一阶方法相比，它们有时可以找到更好的解决方案订购方法。 But that is another story.不过那是另一回事了。

Answer 2

I reached the point that I set, up to max_iter=1200000 on my LinearSVC classifier, but still the "ConvergenceWarning" was still present.我达到了我设置的点，在我的LinearSVC分类器上达到max_iter=1200000 ，但仍然存在“ConvergenceWarning” 。 I fix the issue by just setting dual=False and leaving max_iter to its default.我通过设置dual=False并将max_iter为默认值来解决这个问题。

With LogisticRegression(solver='lbfgs') classifier, you should increase max_iter .使用LogisticRegression(solver='lbfgs')分类器，您应该增加max_iter 。 Mine have reached max_iter=7600 before the "ConvergenceWarning" disappears when training with large dataset's features.在使用大型数据集的特征进行训练时，在“ConvergenceWarning”消失之前，我的已达到max_iter=7600 。

Answer 3

显式指定max_iter可解决警告，因为默认max_iter为 100。[对于逻辑回归]。

 logreg = LogisticRegression(max_iter=1000)

Answer 4

Please incre max_iter to 10000 as default value is 1000. Possibly, increasing no.请将 max_iter 增加到 10000，因为默认值是 1000。可能，增加 no。 of iterations will help algorithm to converge.迭代次数将有助于算法收敛。 For me it converged and solver was -'lbfgs'对我来说它收敛了，求解器是 -'lbfgs'

log_reg = LogisticRegression(solver='lbfgs',class_weight='balanced', max_iter=10000)

Answer 5

Here is an example of how to increase the maximum number of iterations and change the solver in the LogisticRegression object:以下是如何增加最大迭代次数和更改 LogisticRegression object 中求解器的示例：

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Create the LogisticRegression object and set the solver and maximum number of iterations
logreg = LogisticRegression(solver='lbfgs', max_iter=10000)

# Fit the model to the data
logreg.fit(X_scaled, y)

Additional Thing:附加事项：

If someone get error when passed GridSearchCV object following example code increase the maximum number of iterations and change the solver,如果有人在通过 GridSearchCV object 以下示例代码时出现错误，请增加最大迭代次数并更改求解器，

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

lr = LogisticRegression(solver='lbfgs',max_iter=10000)
parameters = {
    'C': [0.001,0.01,0.1,1,10,100,1000],
}

cv = GridSearchCV(lr, parameters, cv=5)
cv.fit(tr_features, tr_labels.values.ravel())

ConvergenceWarning: Liblinear收敛失败，增加迭代次数

问题描述

5 个解决方案

解决方案1
117 已采纳 2018-10-16 06:07:55

解决方案2
27 2020-02-08 13:26:42

解决方案3
10 2020-06-05 18:34:40

解决方案4
4 2020-07-19 10:35:35

解决方案5
0 2023-01-21 13:12:12

ConvergenceWarning: Liblinear收敛失败，增加迭代次数

问题描述

5 个解决方案

解决方案1 117 已采纳 2018-10-16 06:07:55

解决方案2 27 2020-02-08 13:26:42

解决方案3 10 2020-06-05 18:34:40

解决方案4 4 2020-07-19 10:35:35

解决方案5 0 2023-01-21 13:12:12

解决方案1
117 已采纳 2018-10-16 06:07:55

解决方案2
27 2020-02-08 13:26:42

解决方案3
10 2020-06-05 18:34:40

解决方案4
4 2020-07-19 10:35:35

解决方案5
0 2023-01-21 13:12:12