如何在 Python 中将多标签转换为多类？

Question

Let's say I have the following samples with their respective multi-label假设我有以下带有各自多标签的样本

Where X1,X2,X3,X4,X5,X6 are samples其中X1,X2,X3,X4,X5,X6是样本

and Y1,Y2,Y3,Y4 are labels和Y1,Y2,Y3,Y4是标签

X1 : {Y2, Y3}
x2 : {Y1}
X3 : {Y2}
X4 : {Y2, Y3}
X5 : {Y1, Y2, Y3, Y4}
X6 : {Y2}

How do I transform to我如何转换为

X1 : y1
x2 : y2
X3 : y3
X4 : y1
X5 : y4
X6 : y3

What I understood is that this approach is how the transformation happens in the Label Powerset method.我所理解的是，这种方法是在 Label Powerset 方法中发生转换的方式。 But, I do not want to classify using this method.但是，我不想使用这种方法进行分类。 I just wanted to convert the labels.我只是想转换标签。

We gave MultiLabelBinarizer to convert the multi-label to two-class.我们给了MultiLabelBinarizer将多标签转换为二分类。 But this one only creates 0 and 1.但是这个只创建 0 和 1。

Answer 1

If you just want to map sequences of labels to a new label, you could convert those sequences to their string representation and use the LabelEncoder from sklearn .如果您只想将标签序列映射到新标签，则可以将这些序列转换为其字符串表示形式，并使用sklearn中的LabelEncoder 。

from sklearn import preprocessing

Y = [(1, 2), (1, 2, 3, 4), (1,)]

le = preprocessing.LabelEncoder()
le.fit([str(y) for y in Y])

le.transform([str((1,)), str((1, 2))])
>>> array([2, 0])

Do be wary though, any invalid sequence in your test set won't be supported by your label encoder.不过要小心，标签编码器不支持测试集中的任何无效序列。 This suggestion assumes labels are ordered in their representation and non-repeating.此建议假定标签按其表示顺序排列且不重复。

如何在 Python 中将多标签转换为多类？

问题描述

1 个解决方案

解决方案1
0 2022-05-24 15:48:21

如何在 Python 中将多标签转换为多类？

问题描述

1 个解决方案

解决方案1 0 2022-05-24 15:48:21

解决方案1
0 2022-05-24 15:48:21