简体繁体 English

sklearn逻辑回归中的C参数是什么？

[英]what is C parameter in sklearn Logistic Regression?

原文 2021-05-13 02:08:07 4 1 python/ machine-learning/ scikit-learn/ logistic-regression/ overfitting-underfitting

What is the meaning of C parameter in sklearn.linear_model.LogisticRegression ? sklearn.linear_model.LogisticRegression中的C参数是什么意思？ How does it affect the decision boundary?它如何影响决策边界？ Do high values of C make the decision boundary non-linear? C的高值是否会使决策边界非线性？ How does overfitting look like for logistic regression if we visualize the decision boundary?如果我们可视化决策边界，逻辑回归的过度拟合会是什么样子？

1 个解决方案

From the documentation:从文档中：

C: float, default=1.0 Inverse of regularization strength; C：浮点数，默认=1.0 正则化强度的倒数； must be a positive float.必须是正浮点数。 Like in support vector machines, smaller values specify stronger regularization.与支持向量机一样，较小的值指定更强的正则化。

If you don't understand that, Cross Validated may be a better place to ask than here.如果你不明白这一点，Cross Validated 可能比这里更好。

While CS people will often refer to all the arguments to a function as "parameters", in machine learning, C is referred to as a "hyperparameter".虽然 CS 人经常将 arguments 到 function 称为“参数”，但在机器学习中，C 被称为“超参数”。 The parameters are numbers that tells the model what to do with the features, while hyperparameters tell the model how to choose parameters.参数是告诉 model 如何处理特征的数字，而超参数告诉 model 如何选择参数。

Regularization generally refers the concept that there should be a complexity penalty for more extreme parameters.正则化通常是指对于更极端的参数应该有复杂性惩罚的概念。 The idea is that just looking at the training data and not paying attention to how extreme one's parameters are leads to overfitting.这个想法是，仅查看训练数据而不注意参数的极端程度会导致过度拟合。 A high value of C tells the model to give high weight to the training data, and a lower weight to the complexity penalty. C 的高值告诉 model 对训练数据给予较高的权重，而对复杂性惩罚给予较低的权重。 A low value tells the model to give more weight to this complexity penalty at the expense of fitting to the training data.低值告诉 model 以牺牲训练数据拟合为代价给予这种复杂性惩罚更多的权重。 Basically, a high C means "Trust this training data a lot", while a low value says "This data may not be fully representative of the real world data, so if it's telling you to make a parameter really large, don't listen to it".基本上，高 C 意味着“非常信任这个训练数据”，而低值表示“这个数据可能不能完全代表真实世界的数据，所以如果它告诉你让参数变得非常大，不要听对它”。

https://en.wikipedia.org/wiki/Regularization_(mathematics) https://en.wikipedia.org/wiki/Regularization_（数学）