简体繁体中英

Regularization parameter and iteration of SGDClassifier in scikit-learn

原文 2016-01-01 13:19:15 0 1 python/ machine-learning/ scikit-learn

Python scikit-learn SGDClassifier() supports both l1, l2, and elastic, it seems to be important to find optimal value of regularization parameter.

I got an advice to use SGDClassifier() with GridSearchCV() to do this, but in SGDClassifier serves only regularization parameter alpha . If I use loss functions such as SVM or LogisticRegression, I think there should be C instead of alpha for parameter optimization. Is there any way to set optimal parameter in SGDClassifier() when using Logisitic Regression or SVM?

In addition, I have one more question about iteration parameter n_iter, but I did not understand what this parameter mean. Does it work like a bagging if used with shuffle option together? So, if I use l1 penalty and large value of n_iter, would it work like RandomizedLasso()?

1 answers

C and alpha both have the same effect. The difference is a choice of terminology. C is proportional to 1/alpha . You should use GridSearchCV to select either alpha or C the same way, but remember a higher C is more likely to overfit, where a lower alpha is more likely to overfit.

L2 will produce a model with many small coefficients, where L1 will choose a model with a large number of 0 coefficients and a few large coefficients. Elastic net is a combination of the two.

SGDClassifier uses stochastic gradient descent in which the data is fed through the learning algorithm sample by sample. The n_iter tells it how many passes it should make over the data. As the number of iterations goes up and the learning rate goes down, SGD becomes more like batch gradient descent, but it becomes slower as well.

Using scikit-learn's SGDClassifier to implement SVM: how to tune the regularization parameter?

scikit-learn SGDClassifier warm start ignored

scikit-learn add extra data to SGDClassifier

How do I found the lowest regularization parameter (C) using Randomized Logistic Regression in scikit-learn?

The formula for the optimal learning rate in the SGDClassifier in Scikit-learn

SGDClassifier vs LogisticRegression with sgd solver in scikit-learn library

Fix a parameter in a scikit-learn estimator

Scikit-learn multi-output classifier using: GridSearchCV, Pipeline, OneVsRestClassifier, SGDClassifier

Why SGDClassifier with hinge loss is faster than SVC implementation in scikit-learn

Training different scikit-learn classifiers on multiple CPUs for each iteration

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Using scikit-learn's SGDClassifier to implement SVM: how to tune the regularization parameter? scikit-learn SGDClassifier warm start ignored scikit-learn add extra data to SGDClassifier How do I found the lowest regularization parameter (C) using Randomized Logistic Regression in scikit-learn? The formula for the optimal learning rate in the SGDClassifier in Scikit-learn SGDClassifier vs LogisticRegression with sgd solver in scikit-learn library Fix a parameter in a scikit-learn estimator Scikit-learn multi-output classifier using: GridSearchCV, Pipeline, OneVsRestClassifier, SGDClassifier Why SGDClassifier with hinge loss is faster than SVC implementation in scikit-learn Training different scikit-learn classifiers on multiple CPUs for each iteration

Related Tags

Regularization parameter and iteration of SGDClassifier in scikit-learn

Question

1 answers

solution1 8 ACCPTED 2016-01-01 18:20:15

solution1
8 ACCPTED 2016-01-01 18:20:15