简体   繁体   中英

Dummy Variables in Python SKLearn Logistic Regression

I am using logisitic regression in SKLearn to classify data into one of 5 classes. To train the data I have a matrix of observations Y and a matrix of features X .

Sometimes it is the case that my matrix Y will have no category 3 say. In this case when I call the predict_proba(X) method I would like to have a list of 5 probabilities where the 3rd entry is 0 (as there are no category 3 observations). Instead this probability is simply omitted and a list of 4 probabilities is returned.

How can I change the logistic regression object to do this?

LogisticRegression doesn't allow this, but its close cousin SGDClassifier does:

logreg = SGDClassifier(loss="log")
logreg.partial_fit(X, y, classes=np.arange(5))

SGDClassifier differs in its training algorithm and parametrization. If that's not ok, then you'll have to roll your own wrapper code.

A multi-class label can be found using the sklearn.preprocessing module.

Reference: http://scikit-learn.org/stable/modules/preprocessing.html#label-binarization

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM