[英]Python: Logistic regression max_iter parameter is reducing the accuracy
I am doing multiclass/multilabel text classification.我正在做多类/多标签文本分类。 I trying to get rid of the "ConvergenceWarning".
我试图摆脱“ConvergenceWarning”。
When I tuned the max_iter from default to 4000 , the warning is disappeared.当我将max_iter从默认值调整为4000 时,警告消失了。 However, my model accuracy is reduced from 78 to 75 .
但是,我的模型精度从78降低到75 。
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
logreg = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', LogisticRegression(n_jobs=1, C=1e5, solver='lbfgs',multi_class='ovr' ,random_state=0, class_weight='balanced' )),
])
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print('Logistic Regression Accuracy %s' % accuracy_score(y_pred, y_test))
cv_score = cross_val_score(logreg, train_tfidf, y_train, cv=10, scoring='accuracy')
print("CV Score : Mean : %.7g | Std : %.7g | Min : %.7g | Max : %.7g" % (np.mean(cv_score),np.std(cv_score),np.min(cv_score),np.max(cv_score)))
Why my accuracy is reducing when max_iter =4000?为什么当 max_iter =4000 时我的准确度会降低? Is there any other way to fix * "ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)" *
有没有其他方法可以解决*“ConvergenceWarning: lbfgs 未能收敛。增加迭代次数。“迭代次数。”,ConvergenceWarning)” *
It's missing the data used in the question so it's not possible to reproduce the problem but just guess.它缺少问题中使用的数据,因此无法重现问题,只能猜测。
Some things to check:要检查的一些事项:
1) Many estimators such as LogisticRegression
likes (not to say requires) scaled data. 1)许多估计器,例如
LogisticRegression
喜欢(不是说需要)缩放数据。 Depending on your data, you may want to scale with MaxAbsScaler
, MinMaxScaler
, StandardScaler
or RobustAScaler
.根据您的数据,您可能希望使用
MaxAbsScaler
、 MinMaxScaler
、 StandardScaler
或RobustAScaler
。 The optimal choice depends on the kind of problem you are trying to solve, data properties like sparsity, whether negative values are welcomed by the downstream estimator, etc. Scaling data usually speeds up convergence, that may even not require to increase max_iter
.最佳选择取决于您要解决的问题类型、稀疏性等数据属性、下游估计器是否欢迎负值等。缩放数据通常会加快收敛速度,甚至可能不需要增加
max_iter
。
2) In my experience, solver
not "liblinear"
requires more max_iter
iterations to converge given the same input data. 2)根据我的经验,
solver
不是"liblinear"
需要更多的max_iter
迭代来收敛给定相同的输入数据。
3) I didn't see any 'max_iter set in your code snippet. It currently defaults to
3) 我
set in your code snippet. It currently defaults to
没有看到任何 'max_iter set in your code snippet. It currently defaults to
set in your code snippet. It currently defaults to
100` (sklearn 0.22). set in your code snippet. It currently defaults to
100`(sklearn 0.22)。
4) I saw you set the the regularization parameter C=100000
. 4)我看到你设置了正则化参数
C=100000
。 It's drastically reduce the regularization, as C is the inverse of regularization strength.它大大减少了正则化,因为 C 是正则化强度的倒数。 It's expected to consume more iterations and may lead to overfit the model.
预计会消耗更多迭代,并可能导致模型过度拟合。
5) I didn't expect that a higher max_iter
would get you lower accuracy. 5) 我没想到更高的
max_iter
会降低准确度。 The solver is diverging rather than converging.求解器正在发散而不是收敛。 The data may not be scaled or the random state is not fixed or the tolerance
tol
(defaults 1e-4) became to high.数据可能未缩放或随机状态不固定或容差
tol
(默认值 1e-4)变得很高。
6) Check you cross_val_score
cross-validation parameter cv
. 6) 检查您的
cross_val_score
交叉验证参数cv
。 If I'm not wrong, the default behavior doesn't set the random state which result in variable mean accuracy.如果我没有错,默认行为不会设置随机状态,从而导致可变平均准确度。
In my case, I increased the max_iter
by small increments (from default 100 to 400 first and then intervals of 400) till I got rid of the warning.在我的例子中,我以小的增量(从默认的 100 到 400,然后是 400 的间隔)增加了
max_iter
,直到我摆脱了警告。 And, interestingly it increased the model performance parameters (Accuracy, Precision, Recall, F1 Score).而且,有趣的是,它增加了模型性能参数(准确度、精度、召回率、F1 分数)。 Intuitively that makes sense as now the convergence happens and you reach the optimal solution vs. in the earlier case you weren't.
直觉上这是有道理的,因为现在收敛发生了,你达到了最佳解决方案,而在早期的情况下你没有。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.