[英]Multi-label text classification with non-uniform distribution of class labels for every train data
I have a multi-label classification problem, I want to classify texts with six labels, each text can have one to six labels but this label distribution is not equal.我有一个多标签分类问题,我想用六个标签对文本进行分类,每个文本可以有一到六个标签,但这个标签分布不相等。 For example, 10 people annotated sentence1 as below:
例如,10 个人将句子 1 注释如下:
These labels are the number of votes for that class.这些标签是该类的投票数。 I can normalize them like sad 0.7, anger 0.2, fear 0.1, happy 0.0,...
我可以将它们正常化,例如悲伤 0.7、愤怒 0.2、恐惧 0.1、快乐 0.0,...
What is the best classifier for this problem?这个问题的最佳分类器是什么? What is the best type for labels I mean I should normalize them or not?
标签的最佳类型是什么?我的意思是我应该对它们进行标准化还是不标准化?
What keywords should I search for this kind of multi-label classification problem where the probability of labels is not equal?这种标签概率不等的多标签分类问题,我应该搜索哪些关键词?
Well, first, to clarify if I understand your problem correctly.好吧,首先,澄清我是否正确理解您的问题。 You have sentences=[sent1, sent2, ... sentn] and you want to classify them into these six labels labels=[l1,l2,...,l6].
您有句子=[sent1, sent2, ... sentn] 并且您想将它们分类为这六个标签labels=[l1,l2,...,l6]。 Your data isn't the labels themselves, but the probability of having that label in the text.
您的数据不是标签本身,而是文本中包含该标签的概率。 You also mentioned the six labels comes from human annotation (I don't know what you mean by 10 people commented, I'll guess it is annotation)
你还提到六个标签来自人工注释(我不知道你说的10个人评论是什么意思,我猜是注释)
If this is the case, you can deal with the problem with multi-label classification or a multi-target regression perspectives.如果是这种情况,您可以使用多标签分类或多目标回归视角来处理问题。 I'll approach what you can do with your data both cases:
在这两种情况下,我都会处理您可以对数据执行的操作:
Training Models: You can use both shallow and deep models for this task.训练模型:您可以针对此任务使用浅层模型和深层模型。 You need a model that can receive a sentence as input and predict six labels or six probabilities.
您需要一个可以接收句子作为输入并预测六个标签或六个概率的模型。 I suggest you take a look into this example , it can be a very good starting point for your work.
我建议你看看这个例子,它可以成为你工作的一个很好的起点。 The author provides a tutorial on how to build a multi-label text classifier using deep neural networks.
作者提供了有关如何使用深度神经网络构建多标签文本分类器的教程。 He basically built a LSTM and a Feed-forward layer in the end to classify the labels.
他最终基本上构建了一个 LSTM 和一个前馈层来对标签进行分类。 If you decide to use regression instead of classification, you can just drop the activation in the end.
如果您决定使用回归而不是分类,您可以在最后删除激活。
The best results are likely to be obtained by deep neural networks, so the article I sent you can work very well.最好的结果很可能是通过深度神经网络获得的,所以我发给你的文章可以很好地工作。 I also suggest you take a look in the state-of-the-art methods for text classification, such as BERT or XLNET.
我还建议您查看最先进的文本分类方法,例如 BERT 或 XLNET。 I implemented a Multi-label classification method using BERT , maybe it can be helpful to you.
我使用BERT实现了一个多标签分类方法,也许对你有帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.