简体繁体中英

Training spaCy TextCategorizer with data that belongs to no label?

原文 2022-11-17 17:53:16 8 1 nlp/ spacy/ text-classification/ multilabel-classification

I'm gathering training data for multilabel classification. Some of the data fed into this project will not have enough information to assign it to one of the labels. If I train the model with data that belongs to no label, will it avoid labelling new data that is unclear? Do I need to train it with an "Unclear" label or should I just leave this type of data unlabelled?

I can't seem to find the answer to this question in the spaCy docs.

1 answers

Assuming you really want multilabel classification, ie an instance can have zero or multiple classes, then it's fine to have some data without any label. If the model performs correctly, it should also predict no label for similar instances. Be careful however that no label doesn't mean unclear for the model, it means that none of the possible classes apply (they are considered independently).

Note that in the case of multiclass classification, ie an instance always has exactly one class, it is impossible to assign no label to an instance. But it would also be suboptimal to create a class 'unclear', because in multiclass classification the model predicts the most likely class, ie relatively to the others. Semantically 'no label' is not a regular label comparable to the others.

Technically this is not a programming question (for future reference, better ask such questions on https://datascience.stackexchange.com/ or https://stats.stackexchange.com/ ).

SpaCy TextCategorizer Pipeline detailed

Training Data Format with Spacy

About training data for spaCy NER

SpaCy model training data: WikiNER

Set validation data in SpaCy NER training

what is the meaning of heads in spacy training data?

Converting NER training data to Spacy training data format

Methods for creating training data for SpaCy models?

Documents in training data belongs to a particular topic in LDA

Spacy NER Model Training Data Improvement

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question SpaCy TextCategorizer Pipeline detailed Training Data Format with Spacy About training data for spaCy NER SpaCy model training data: WikiNER Set validation data in SpaCy NER training what is the meaning of heads in spacy training data? Converting NER training data to Spacy training data format Methods for creating training data for SpaCy models? Documents in training data belongs to a particular topic in LDA Spacy NER Model Training Data Improvement

Related Tags

Training spaCy TextCategorizer with data that belongs to no label?

Question

1 answers

solution1 0 2022-11-22 00:29:16

solution1
0 2022-11-22 00:29:16