[英]How to predict entities for multiple sentences using spaCy?
我使用 spaCy 訓練了一個 ner model。 我知道如何使用它來識別單個句子(doc 對象)的實體並可視化結果:
doc = disease_blank('Example sentence')
spacy.displacy.render(doc, style="ent", jupyter=True)
或者
for ent in doc.ents:
print(ent.text, ent.label_)
現在我想預測多個這樣的句子的實體。 我的想法是按實體過濾句子。 目前我剛剛找到以下方法來做到這一點:
sentences = ['sentence 1', 'sentence2', 'sentence3']
for element in sentences:
doc = nlp(element)
for ent in doc.ents:
if ent.label_ == "LOC":
print(doc)
# returns all sentences which have the entitie "LOC"
我的問題是是否有更好、更有效的方法來做到這一點?
您有 2 個選項可以加快當前的實施速度:
import spacy
import multiprocessing
cpu_cores = multiprocessing.cpu_count()-2 if multiprocessing.cpu_count()-2 > 1 else 1
nlp = spacy.load("./path/to/your/own/model")
sentences = ['sentence 1', 'sentence2', 'sentence3']
for doc in nlp.pipe(sentences, n_process=cpu_cores): # disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"] ... if your model has them. Check with `nlp.pipe_names`
# returns all sentences which have the entitie "LOC"
print([(doc) for ent in doc.ents if ent.label_ == "LOC"])
import spacy
import multiprocessing
from spacy.language import Language
cpu_cores = multiprocessing.cpu_count()-2 if multiprocessing.cpu_count()-2 > 1 else 1
@Language.component("loc_label_filter")
def custom_component_function(doc):
old_ents = doc.ents
new_ents = [item for item in old_ents if item.label_ == "LOC"]
doc.ents = new_ents
return doc
nlp = spacy.load("./path/to/your/own/model")
nlp.add_pipe("loc_label_filter", after="ner")
sentences = ['sentence 1', 'sentence2', 'sentence3']
for doc in nlp.pipe(sentences, n_process=cpu_cores):
print([(doc) for ent in doc.ents])
重要的:
sentences
變量包含數百或數千個樣本,這些結果將會很明顯; 如果句子“小” (即它只包含一百個或更少的句子),您(和時間基准)可能不會注意到很大的差異。nlp.pipe
中的batch_size
參數也可以進行微調,但根據我自己的經驗,只有在使用前面的提示仍然看不到明顯差異的情況下,您才想這樣做。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.