简体   繁体   English

训练自定义NER模型

[英]Training custom NER model

I have been training my NER model on some text and trying to find cities in that with custom entities.我一直在用一些文本训练我的 NER 模型,并试图在其中找到带有自定义实体的城市。

Example:-例子:-

    ('paragraph Designated Offices Party A New York Party B Delaware paragraph pricing source calculation Market Value shall generally accepted pricing source reasonably agreed parties paragraph Spot rate Spot Rate specified paragraph reasonably agreed parties',
  {'entities': [(37, 41, 'DesignatedBankLoc'),(54, 62, 'CounterpartyBankLoc')]})

I am looking for 2 entities here DesignatedBankLoc and CounterpartyBankLoc .我在这里寻找 2 个实体DesignatedBankLocCounterpartyBankLoc There can be multiple entities also for individual text.单个文本也可以有多个实体。

currently I am training on 60 rows of data as follows:目前我正在训练 60 行数据,如下所示:

import spacy
import random
def train_spacy(data,iterations):
    TRAIN_DATA = data
    nlp = spacy.blank('en')  # create blank Language class
    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last=True)


    # add labels
    for _, annotations in TRAIN_DATA:
         for ent in annotations.get('entities'):
            # print (ent[2])
            ner.add_label(ent[2])

    # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        optimizer = nlp.begin_training()
        for itn in range(iterations):
            print("Statring iteration " + str(itn))
            random.shuffle(TRAIN_DATA)
            losses = {}
            for text, annotations in TRAIN_DATA:
                nlp.update(
                    [text],  # batch of texts
                    [annotations],  # batch of annotations
                    drop=0.5,  # dropout - make it harder to memorise data
                    sgd=optimizer,  # callable to update weights
                    losses=losses)
            print(losses)
    return nlp


prdnlp = train_spacy(TRAIN_DATA, 100)

My problem is:-我的问题是:-

Model is predicting correct when input is different/same pattern of text contains trained cities.当输入不同/相同模式的文本包含受过训练的城市时,模型预测是正确的。 Model is not predicting for any of the entities even if same/different pattern of text but different cities which never occurs in training data set.即使在训练数据集中从未出现过的相同/不同的文本模式但不同的城市,模型也不会预测任何实体。

Please suggest me why it is happening please make me understand the concept how it is getting train?请告诉我为什么会这样,请让我了解它是如何获得训练的概念?

Based on experience, you have 60 rows of data and train for 100 iterations.根据经验,您有 60 行数据并训练 100 次迭代。 You are overfitting on the value of the entities as opposed to their position.您过度拟合实体的价值而不是它们的位置。

To check this, try to inject the city names at random places in a sentence and see what happens.要检查这一点,请尝试在句子中的随机位置注入城市名称,然后看看会发生什么。 If the algorithm tags them, you're likely overfitting.如果算法标记了它们,则您可能会过度拟合。

There are two solutions:有两种解决方案:

  • Create more training data with more varied values for these entities为这些实体创建更多具有更多变化值的训练数据
  • Test for different number of iterations测试不同的迭代次数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 SpaCy 自定义 NER 模型训练中“drop”的含义? - Meaning of "drop" in SpaCy custom NER model training? 训练自定义 SpaCy NER model 给出训练错误 - Training custom SpaCy NER model gives training error SpaCy 自定义 NER Model:依赖解析器训练错误 - SpaCy Custom NER Model: Dependency Parser Training Error spaCy NER 训练新模型问题 - spaCy NER training new model issues Spacy NER模型训练数据改进 - Spacy NER Model Training Data Improvement spaCy 2.0:从excel文件自定义NER模型问题中加载培训数据 - spaCy 2.0: Loading Training Data from excel file Custom NER Model issues 使用自定义数据训练 Spacy 的预定义 NER 模型,需要了解复合因子、批量大小和损失值 - Training predefined NER model of Spacy, with custom data, need idea about compound factor, batch size and loss values 如何使用 SpaCy 更改自定义 NER model 再训练的训练数据格式? - How to change the format of training data for custom NER model retraining using SpaCy? 在自定义数据集上训练 Spacy NER 出错 - Training Spacy NER on custom dataset gives error 如何从头开始为 BIOES/BILOU 格式的自定义多类standfordNLP/Stanza NER 标记 model 构建训练数据集? - How do you build training dataset from scratch for a custom multi-class standfordNLP/Stanza NER tagging model in BIOES/BILOU format?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM