Let's say I have the following samples with their respective multi-label
Where X1,X2,X3,X4,X5,X6
are samples
and Y1,Y2,Y3,Y4
are labels
X1 : {Y2, Y3}
x2 : {Y1}
X3 : {Y2}
X4 : {Y2, Y3}
X5 : {Y1, Y2, Y3, Y4}
X6 : {Y2}
How do I transform to
X1 : y1
x2 : y2
X3 : y3
X4 : y1
X5 : y4
X6 : y3
What I understood is that this approach is how the transformation happens in the Label Powerset method. But, I do not want to classify using this method. I just wanted to convert the labels.
We gave MultiLabelBinarizer
to convert the multi-label to two-class. But this one only creates 0 and 1.
If you just want to map sequences of labels to a new label, you could convert those sequences to their string representation and use the LabelEncoder
from sklearn
.
from sklearn import preprocessing
Y = [(1, 2), (1, 2, 3, 4), (1,)]
le = preprocessing.LabelEncoder()
le.fit([str(y) for y in Y])
le.transform([str((1,)), str((1, 2))])
>>> array([2, 0])
Do be wary though, any invalid sequence in your test set won't be supported by your label encoder. This suggestion assumes labels are ordered in their representation and non-repeating.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.