用于文本分类的预训练模型

Question

所以我有几个没有标签的词，但我需要将它们分为 4-5 个类别。 我可以明显地说这个测试集是可以分类的。 虽然我没有训练数据所以我需要使用预训练的 model 来对这些词进行分类。 哪个 model 适合这个范例，它已经在哪个数据集上进行过训练？

谢谢

Answer 1

我们正在执行的任务称为零样本主题分类 - 预测 model 尚未训练的主题。 Hugging Face 库支持此范例，您可以在此处阅读更多内容。 最常见的预训练 model 是 Bart Large MNLI - 在MNLI 数据集上训练后bart-large的检查点。 下面是一个简单的例子，展示了未经任何初步训练的短语“我喜欢热狗”的分类：

首先，请安装变压器库：
```
 pip install --upgrade transformers
```

然后导入并初始化管道：

 from transformers import pipeline classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')

输入我们的玩具数据集：

 labels = ["artifacts", "animals", "food", "birds"] hypothesis_template = 'This text is about {}.' sequence = "I like hot dogs"

预测 label：

 prediction = classifier(sequence, labels, hypothesis_template=hypothesis_template, multi_class=True) print(prediction)

output 将类似于

`{'sequence': 'i like hot dogs', 
'labels': ['food', 'animals', 'artifacts', 'birds'], 
'scores': [0.9971900582313538, 0.00529429130256176, 0.0020991512574255466, 
0.00023589911870658398]}`

可以解释为，model 将最高概率（0.997..）分配给 label 'food'，这是正确答案。

用于文本分类的预训练模型

问题描述

1 个解决方案

解决方案1
0 2021-02-07 12:00:29

用于文本分类的预训练模型

问题描述

1 个解决方案

解决方案1 0 2021-02-07 12:00:29

解决方案1
0 2021-02-07 12:00:29