[英]Training predefined NER model of Spacy, with custom data, need idea about compound factor, batch size and loss values
我正在嘗試訓練 spacy NER 模型,我有大約 2600 個段落長度的數據,每個段落的長度從 200 到 800 個單詞。 我必須添加兩個新的實體標簽,PRODUCT 和 SPECIFICATION。 是不是,這種方法可以很好地訓練,以防萬一沒有最好的替代方法呢? 如果可以,那么任何人都可以建議我復合因子和批量大小的適當值,並且在訓練時,損失值應該在范圍內,知道嗎? 就好像現在我的損失值在 400-5 之間。
def main(model=None, new_model_name='product_details_parser',
output_dir=Path('/xyz_path/'), n_iter=20):
"""Set up the pipeline and entity recognizer, and train the new
entity."""
if model is not None:
nlp = spacy.load(model) # load existing spaCy model
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank('en') # create blank Language class
print("Created blank 'en' model")
# Add entity recognizer to model if it's not in the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)
# otherwise, get it, so we can add labels to it
else:
ner = nlp.get_pipe('ner')
ner.add_label(LABEL) # add new entity label to entity recognizer
if model is None:
optimizer = nlp.begin_training()
else:
# Note that 'begin_training' initializes the models, so it'll zero out
# existing entity types.
optimizer = nlp.entity.create_optimizer()
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
for itn in range(n_iter):
random.shuffle(ret_data)
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(ret_data, size=compounding(1., 32., 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.35,losses=losses)
print('Losses', losses)
if __name__ == '__main__':
plac.call(main)
您可以從簡單的訓練方法 ( https://spacy.io/usage/training#training-simple-style ) 開始,而不是這種類型的訓練。 與您的方法相比,這種簡單的方法可能需要一些時間,但會產生更好的結果。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.