简体   繁体   English

使用自定义数据训练 Spacy 的预定义 NER 模型,需要了解复合因子、批量大小和损失值

[英]Training predefined NER model of Spacy, with custom data, need idea about compound factor, batch size and loss values

I am trying to train spacy NER model, i have data of about 2600 paragraphs length of paragraph range from 200 to 800 words each.我正在尝试训练 spacy NER 模型,我有大约 2600 个段落长度的数据,每个段落的长度从 200 到 800 个单词。 I have to add Two new entity labels, PRODUCT and SPECIFICATION.我必须添加两个新的实体标签,PRODUCT 和 SPECIFICATION。 Is it, this approach is good to train in case no any best alternative to do so?是不是,这种方法可以很好地训练,以防万一没有最好的替代方法呢? if it is ok, to go with then can anyone suggest me the appropriate values of Compounding factor and batch size and while training, Losses value should range in, any idea?如果可以,那么任何人都可以建议我复合因子和批量大小的适当值,并且在训练时,损失值应该在范围内,知道吗? as of if now i am getting my losses value ranging from 400-5.就好像现在我的损失值在 400-5 之间。

def main(model=None, new_model_name='product_details_parser', 
output_dir=Path('/xyz_path/'), n_iter=20):
"""Set up the pipeline and entity recognizer, and train the new
 entity."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank('en')  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe('ner')
    ner.add_label(LABEL)   # add new entity label to entity recognizer
    if model is None:
        optimizer = nlp.begin_training()
    else:
        # Note that 'begin_training' initializes the models, so it'll zero out
        # existing entity types.
        optimizer = nlp.entity.create_optimizer()

     # get names of other pipes to disable them during training
     other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
     with nlp.disable_pipes(*other_pipes):  # only train NER
        for itn in range(n_iter):
            random.shuffle(ret_data)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(ret_data, size=compounding(1., 32., 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35,losses=losses)
            print('Losses', losses)

if __name__ == '__main__':
    plac.call(main)

Instead of this type of traning,you can begin with simple training method ( https://spacy.io/usage/training#training-simple-style ).您可以从简单的训练方法 ( https://spacy.io/usage/training#training-simple-style ) 开始,而不是这种类型的训练。 This simple method might take sometime when compared to your method,but will yeild better results.与您的方法相比,这种简单的方法可能需要一些时间,但会产生更好的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM