Predicting multilabel data with sklearn

Question

According to the docs, the OneVsRest classifier supports multilabel classification: http://scikit-learn.org/stable/modules/multiclass.html#multilabel-learning

Here's the code I'm trying to run:

from sklearn import metrics
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC

x = [[1,2,3],[3,3,2],[8,8,7],[3,7,1],[4,5,6]]
y = [['bar','foo'],['bar'],['foo'],['foo','jump'],['bar','fox','jump']]

y_enc = MultiLabelBinarizer().fit_transform(y)

train_x, train_y, test_x, test_y = train_test_split(x, y_enc, test_size=0.33)

clf = OneVsRestClassifier(SVC())
clf.fit(train_x, train_y)
predictions = clf.predict_proba(test_x)

my_metrics = metrics.classification_report( test_y, predictions)
print my_metrics

I get the following error:

Traceback (most recent call last):
  File "multilabel.py", line 178, in <module>
    clf.fit(train_x, train_y)
  File "/sklearn/lib/python2.6/site-packages/sklearn/multiclass.py", line 277, in fit
    Y = self.label_binarizer_.fit_transform(y)
  File "/sklearn/lib/python2.6/site-packages/sklearn/base.py", line 455, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/sklearn/lib/python2.6/site-packages/sklearn/preprocessing/label.py", line 302, in fit
    raise ValueError("Multioutput target data is not supported with "
ValueError: Multioutput target data is not supported with label binarization

Not using the MultiLabelBinarizer gives the same error, so I'm assuming that's not the problem. Does anyone know how to use this classifier for multilabel data?

Answer 1

Your train_test_split() output is not correct. Change this line:

train_x, train_y, test_x, test_y = train_test_split(x, y_enc, test_size=0.33)

To this:

train_x, test_x, train_y, test_y = train_test_split(x, y_enc, test_size=0.33)

Also, to use probabilities instead of class predictions, you'll need to change SVC() to SVC(probability = True) and change clf.predict_proba to clf.predict .

Putting it all together:

from sklearn import metrics
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC


x = [[1,2,3],[3,3,2],[8,8,7],[3,7,1],[4,5,6]]
y = [['bar','foo'],['bar'],['foo'],['foo','jump'],['bar','fox','jump']]

mlb = MultiLabelBinarizer()
y_enc = mlb.fit_transform(y)

train_x, test_x, train_y, test_y = train_test_split(x, y_enc, test_size=0.33)

clf = OneVsRestClassifier(SVC(probability=True))
clf.fit(train_x, train_y)
predictions = clf.predict(test_x)

my_metrics = metrics.classification_report( test_y, predictions)
print my_metrics

This gives me no errors when I run it.

Answer 2

I also experienced "ValueError: Multioutput target data is not supported with label binarization" with OneVsRestClassifier. My issue was caused by the type of training data was "list", after casting with np.array(), it works.

Answer 3

对我来说，在 np.array() 中包装train_x 、 train_y 、 text_x和test_y已经解决了这个问题。

Predicting multilabel data with sklearn

Question

3 answers

solution1
10 ACCPTED 2016-05-06 14:42:12

solution2
3 2017-08-07 23:18:13

solution3
0 2019-11-11 08:54:26

Predicting multilabel data with sklearn

Question

3 answers

solution1 10 ACCPTED 2016-05-06 14:42:12

solution2 3 2017-08-07 23:18:13

solution3 0 2019-11-11 08:54:26

solution1
10 ACCPTED 2016-05-06 14:42:12

solution2
3 2017-08-07 23:18:13

solution3
0 2019-11-11 08:54:26