简体繁体 English

Logle回归与sklearn

[英]Logistic Regression with sklearn

原文 2015-09-22 18:51:49 3 1 python/ scikit-learn/ classification/ logistic-regression

Not sure if this is a great place for this question, but I was told CrossValidated was not. 不确定这是否是这个问题的好地方，但我被告知CrossValidated不是。 So, all these questions refer to sklearn, but if you have insights into logistic regression in general, I'd love to hear them as well. 所以，所有这些问题都涉及sklearn，但如果你对逻辑回归有一般的见解，我也很乐意听到它们。

1) Does data have to be standardizes(mean 0, stdev 1)? 1）数据是否必须标准化（平均0，stdev 1）？
2) In sklearn, how do I specify what kind of regularization I want (L1 vs L2)? 2）在sklearn中，如何指定我想要的正则化类型（L1与L2）？ Note that this is different from penalty; 请注意，这与惩罚不同; penalty refers to classification error, not pentalty on coefficients. 惩罚是指分类错误，而不是对系数的修正。
3) How can I use to also do variable selection? 3）我如何使用变量选择？ Ie, analogously to lasso for linear regression. 即，类似于用于线性回归的套索。
4) When using regularization, how do I optimize for C, the regularization strength? 4）使用正则化时，如何优化C，正则化强度？ Is there something built-in, or do I have to take care of this myself? 有内置的东西，还是我自己要照顾好这个？

Probably an example would be most helpful, but I'd appreciate any insights on any of these questions. 可能一个例子是最有帮助的，但我很欣赏任何这些问题的见解。

This has been my starting point: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html 这是我的出发点： http ： //scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Thank you very much in advance! 非常感谢你提前！

1 个解决方案

1) For logistic regression, no. 1）对于逻辑回归，没有。 You are not computing distances between instances. 您没有计算实例之间的距离。

2) You can specify the penalty='l1' or penalty='l2' parameter. 2）您可以指定penalty='l1'或penalty='l2'参数。 See the LogisticRegression page . 请参阅LogisticRegression页面。 L2 penalty is default. L2惩罚是默认的。

3) There are various explicit feature selection techniques that scikit-learn provides, eg using SelectKBest with a chi2 ranking function. 3）有迹象表明，scikit学习提供，例如使用各种明确的特征选择技术SelectKBest用卡方排名功能。

4) You will want to do a Grid Search for the optimal parameter. 4）您将要进行网格搜索以获得最佳参数。

For more detail on all these questions, I suggest going through some of the Examples , eg this one and this one . 有关所有这些问题的更多详细信息，我建议您阅读一些示例，例如本章和本章。