简体   繁体   中英

Using l1 penalty with LogisticRegressionCV() in scikit-learn

I am using python scikit-learn library for classification.

As a feature selection step, I want to use RandomizedLogisticRegression().

So for finding best value of C by cross-validation, I used LogisticRegressionCV(penalty='l1', solver='liblinear'). However, all coefficients were all 0 in this case. Using l2 penalty works without problem. Also, single run of LogisticRegression() with l1 penalty seems to give proper coeffients.

I am using RandomizedLasso and LassoCV() for work-around, but I am not sure whether it is proper to use LASSO for binary class label.

So my question is like these.

  1. Is there some problem in using LogisticRegressionCV() in my case?
  2. Is there another way to find best value of C_ for logistic regression except GridSearchCV()?
  3. Is it possible to use LASSO for binary(not continuous) classification?

From what you describe, I can say that the coefficient of the l1 regularisation term is high in your case which you need to decrease.

When the coefficient is very high, the regularisation terms becomes more important than the error term and so your model just becomes very sparse and doesn't predict anything.

I checked the LogisticRegressionCV and it says that it will search from 1e-4 to 1e4 using the Cs argument. The documentation says that in order to have lower regularisation coefficients you need to have higher Cs if you provide an integer. Alternatively you can possibly provide the inverse of regularisation coefficients yourself as a list.

So play with the Cs parameter and try to lower the regularisation coefficient.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM