简体   繁体   English

如何通过spacy重新训练模型?

[英]How the model is retrain by spacy?

Earlier token 'Modi' is recognised as an Org by spacy to I retrain it with the following code: 较早的令牌'Modi'通过spacy被识别为组织,我使用以下代码对其进行了重新培训:

import spacy 
import random
nlp = spacy.load('en')
nlp.entity.add_label('CELEBRITY')
TRAIN_DATA = [
        (u"Modi", {"entities": [(0, 4, "PERSON")]}),
        (u"India", {"entities": [(0, 5, "GPE")]})]

optimizer = nlp.begin_training()
for i in range(20):
    random.shuffle(TRAIN_DATA)
    for text, annotations in TRAIN_DATA:
        nlp.update([text], [annotations],drop=0.3, sgd=optimizer)


text = "But Modi is starting India. The company made a late push\ninto hardware, and Apple’s Siri and Google available on iPhones, and Amazon’s Alexa\nsoftware, which runs on its Echo and Dot devices, have clear leads in\nconsumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text,ent.label_)

And I got the following answer: 我得到以下答案:

Modi PERSON
India GPE
Apple’s Siri ORG
Google ORG
iPhones ORG
Amazon GPE
Echo PERSON
Dot PERSON

It changes the Modi to the person at the same time it doing incorrect NER as compare to the previous mode. 与以前的模式相比,它在执行错误的NER的同时将Modi更改为人员。 In the previous model, Amazon was recognized as ORG but now change to GPE. 在以前的模型中,Amazon被认为是ORG,但现在更改为GPE。 Now I add the extra-label CELEBRITY and categorize Modi to CELEBRITY with this following code 现在,我使用以下代码添加额外标签的CELEBRITY并将Modi归类为CELEBRITY


import spacy 
import random
nlp = spacy.load('en')
nlp.entity.add_label('CELEBRITY')
TRAIN_DATA = [
        (u"Modi", {"entities": [(0, 4, "CELEBRITY")]})]

optimizer = nlp.begin_training()
for i in range(20):
    random.shuffle(TRAIN_DATA)
    for text, annotations in TRAIN_DATA:
        nlp.update([text], [annotations],drop=0.3, sgd=optimizer)


text = "But Modi is starting India. The company made a late push\ninto hardware, and Apple’s Siri and Google available on iPhones, and Amazon’s Alexa\nsoftware, which runs on its Echo and Dot devices, have clear leads in\nconsumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text,ent.label_)

But looks like it crashes my model and getting the following result: 但是看起来它使我的模型崩溃并获得以下结果:

But CELEBRITY
Modi CELEBRITY
is CELEBRITY
starting CELEBRITY
India GPE
. CELEBRITY
The CELEBRITY
company CELEBRITY
made CELEBRITY
a CELEBRITY
late CELEBRITY
push CELEBRITY
into CELEBRITY
hardware CELEBRITY
, CELEBRITY
and CELEBRITY
Apple CELEBRITY

Please let me know the behind the seen reason and also how can I achieve that only entity which I label should change while all other should be according to spacy. 请让我知道所看到的原因背后的原因,以及我如何才能实现我标记的唯一实体应该更改,而所有其他实体都应根据保留时间进行更改。

您应该在训练数据中提及句子中存在的所有实体,而不仅仅是新实体。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM