简体   繁体   English

更新现有的 spacy NER 模型

[英]Updating an already existing spacy NER model

I want to update and already existing spacy model 'en_core_web_sm' and train it with additional data.我想更新和已经存在的 spacy 模型“en_core_web_sm”并用额外的数据训练它。

My data is in the same format as mentioned in spacy's documentation https://spacy.io/usage/training我的数据与 spacy 文档中提到的格式相同https://spacy.io/usage/training

I've followed the same steps as mentioned in the documentation for updating an NER model with my data.我已经按照文档中提到的相同步骤使用我的数据更新 NER 模型。

def model_train(output_dir=None, n_iter=100):
    """Load the model, set up the pipeline and train the entity recognizer."""
    model=('en_core_web_sm')
    nlp = spacy.load(model, entity = False, parser = False)  # load existing spaCy model
    print("Loaded model '%s'" % model)
    print (nlp.pipe_names)

#     # create the built-in pipeline components and add them to the pipeline

    ner = nlp.get_pipe("ner")

#     # add labels
    for texts, annotations in TRAIN_DATA:        
        for ent in annotations.get("entities"):
#             print (ent)
            ner.add_label(ent[2])
#             print (ent[2])

    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
    with nlp.disable_pipes(*other_pipes):  # only train NER

        if model is None:
            nlp.begin_training()
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
#             # batch up the examples using spaCy's minibatch
            batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(
                    [texts],  # batch of texts
                    [annotations],  # batch of annotations
                    drop=0.5,  # dropout - make it harder to memorise data
                    losses=losses,
                )
            print("Losses", losses)


        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

The error that I'm getting is我得到的错误是

Loaded model 'en_core_web_sm'
['tagger', 'parser', 'ner']
------------------------------------------------------------------------- 

- TypeError Traceback (most recent call last) in ----> 1 model_train() - ----> 1 model_train() 中的 TypeError Traceback(最近一次调用)

<ipython-input-337-91366511ed4d> in model_train(output_dir, n_iter)
     56                     [annotations],  # batch of annotations
     57                     drop=0.5,  # dropout - make it harder to 
memorise data
---> 58                     losses=losses,
     59                 )
     60             print("Losses", losses)

C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py in 
update(self, docs, golds, drop, sgd, losses, component_cfg)
    432                 doc = self.make_doc(doc)
    433             if not isinstance(gold, GoldParse):
--> 434                 gold = GoldParse(doc, **gold)
    435             doc_objs.append(doc)
    436             gold_objs.append(gold)

TypeError: type object argument after ** must be a mapping, not tuple

zip() already sending list of texts and annotation zip() 已经发送文本和注释列表

nlp.update(
                    texts,  # batch of texts
                    annotations,  # batch of annotations
                    drop=0.5,  # dropout - make it harder to memorise data
                    losses=losses,
                )

Click here点击这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM