简体繁体 English

使用 FastText 进行多标签分类

[英]Multi-label classification with FastText

原文 2022-03-03 13:51:50 0 2 python/ nlp/ multilabel-classification/ fasttext

I was wondering if FastText is able to deal with multi-labelled data?我想知道 FastText 是否能够处理多标签数据？ Could someone share a simple example along with a confusion matrix (true vs predicted labels)?有人可以分享一个简单的例子和混淆矩阵（真实标签与预测标签）吗？ I have already taken a look at FastText documentation page.我已经看过 FastText 文档页面。

Thank you in advance先感谢您

2 个解决方案

This section describes multi label classification : https://github.com/facebookresearch/fastText/blob/main/docs/supervised-tutorial.md#multi-label-classification本节介绍多label分类： https://github.com/facebookresearch/fastText/blob/main/docs/supervised-tutorial.md#multi-label-classification

A convenient way to handle multiple labels is to use independent binary classifiers for each label. This can be done with -loss one-vs-all or -loss ova .处理多个标签的一种便捷方法是为每个 label 使用独立的二进制分类器。这可以通过-loss one-vs-all或-loss ova来完成。

Preparing training data准备训练数据

That has been described at the end of the section Installing fastText这已在安装 fastText部分的末尾进行了描述

Each line of the text file contains a list of labels , followed by the corresponding document .文本文件的每一行都包含一个标签列表，后面是相应的文档。 All the labels start by the __label __ prefix, which is how fastText recognize what is a label or what is a word.所有标签都以__label __前缀开头，这就是 fastText 识别 label 或单词的方式。

The docs, & the format for supplying labeled text, only seem to mention a single label per text.文档和提供标签文本的格式似乎只提到每个文本一个 label。

You could try repeating the same text more than once in your training data, each time with one of the appropriate labels.您可以尝试在训练数据中多次重复相同的文本，每次都使用适当的标签之一。 (You might want to re-shuffle the training data so that such repeated texts don't appear directly alongside each other.) （你可能想重新打乱训练数据，这样重复的文本就不会直接并排出现。）