简体   繁体   English

如何在 Python 中将多标签转换为多类?

[英]How to transform multi-label to multi-class in Python?

Let's say I have the following samples with their respective multi-label假设我有以下带有各自多标签的样本

Where X1,X2,X3,X4,X5,X6 are samples其中X1,X2,X3,X4,X5,X6是样本

and Y1,Y2,Y3,Y4 are labelsY1,Y2,Y3,Y4是标签

X1 : {Y2, Y3}
x2 : {Y1}
X3 : {Y2}
X4 : {Y2, Y3}
X5 : {Y1, Y2, Y3, Y4}
X6 : {Y2}

How do I transform to我如何转换为

X1 : y1
x2 : y2
X3 : y3
X4 : y1
X5 : y4
X6 : y3

What I understood is that this approach is how the transformation happens in the Label Powerset method.我所理解的是,这种方法是在 Label Powerset 方法中发生转换的方式。 But, I do not want to classify using this method.但是,我不想使用这种方法进行分类。 I just wanted to convert the labels.我只是想转换标签。

We gave MultiLabelBinarizer to convert the multi-label to two-class.我们给了MultiLabelBinarizer将多标签转换为二分类。 But this one only creates 0 and 1.但是这个只创建 0 和 1。

If you just want to map sequences of labels to a new label, you could convert those sequences to their string representation and use the LabelEncoder from sklearn .如果您只想将标签序列映射到新标签,则可以将这些序列转换为其字符串表示形式,并使用sklearn中的LabelEncoder

from sklearn import preprocessing

Y = [(1, 2), (1, 2, 3, 4), (1,)]

le = preprocessing.LabelEncoder()
le.fit([str(y) for y in Y])

le.transform([str((1,)), str((1, 2))])
>>> array([2, 0])

Do be wary though, any invalid sequence in your test set won't be supported by your label encoder.不过要小心,标签编码器不支持测试集中的任何无效序列。 This suggestion assumes labels are ordered in their representation and non-repeating.此建议假定标签按其表示顺序排列且不重复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM