Dummy Variables in Python SKLearn Logistic Regression

Question

I am using logisitic regression in SKLearn to classify data into one of 5 classes. To train the data I have a matrix of observations Y and a matrix of features X .

Sometimes it is the case that my matrix Y will have no category 3 say. In this case when I call the predict_proba(X) method I would like to have a list of 5 probabilities where the 3rd entry is 0 (as there are no category 3 observations). Instead this probability is simply omitted and a list of 4 probabilities is returned.

How can I change the logistic regression object to do this?

Answer 1

LogisticRegression doesn't allow this, but its close cousin SGDClassifier does:

logreg = SGDClassifier(loss="log")
logreg.partial_fit(X, y, classes=np.arange(5))

SGDClassifier differs in its training algorithm and parametrization. If that's not ok, then you'll have to roll your own wrapper code.

Answer 2

A multi-class label can be found using the sklearn.preprocessing module.

Reference: http://scikit-learn.org/stable/modules/preprocessing.html#label-binarization

Dummy Variables in Python SKLearn Logistic Regression

Question

2 answers

solution1
3 2014-01-30 10:03:24

solution2
1 ACCPTED 2014-03-12 11:16:59

Dummy Variables in Python SKLearn Logistic Regression

Question

2 answers

solution1 3 2014-01-30 10:03:24

solution2 1 ACCPTED 2014-03-12 11:16:59

solution1
3 2014-01-30 10:03:24

solution2
1 ACCPTED 2014-03-12 11:16:59