如何使用 spaCy 預測多個句子的實體？

Question

我使用 spaCy 訓練了一個 ner model。 我知道如何使用它來識別單個句子（doc 對象）的實體並可視化結果：

doc = disease_blank('Example sentence')    
spacy.displacy.render(doc, style="ent", jupyter=True)

或者

for ent in doc.ents:
    print(ent.text, ent.label_)

現在我想預測多個這樣的句子的實體。 我的想法是按實體過濾句子。 目前我剛剛找到以下方法來做到這一點：

sentences = ['sentence 1', 'sentence2', 'sentence3']
for element in sentences:
    doc = nlp(element)
    for ent in doc.ents:
        if ent.label_ == "LOC":
        print(doc)
 # returns all sentences which have the entitie "LOC"

我的問題是是否有更好、更有效的方法來做到這一點？

Answer 1

您有 2 個選項可以加快當前的實施速度：

在此處使用 spaCy 開發人員提供的提示。 在不知道您的自定義 NER model 管道具有哪些特定組件的情況下，您的代碼重構如下：

import spacy
import multiprocessing

cpu_cores = multiprocessing.cpu_count()-2 if multiprocessing.cpu_count()-2 > 1 else 1
nlp = spacy.load("./path/to/your/own/model")

sentences = ['sentence 1', 'sentence2', 'sentence3']
for doc in nlp.pipe(sentences, n_process=cpu_cores):  # disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"] ... if your model has them. Check with `nlp.pipe_names`
    # returns all sentences which have the entitie "LOC"
    print([(doc) for ent in doc.ents if ent.label_ == "LOC"])

將之前的知識與 spaCy 自定義組件的使用結合起來（如此處詳細解釋）。 使用此選項，您重構/改進的代碼將如下所示：

import spacy
import multiprocessing
from spacy.language import Language

cpu_cores = multiprocessing.cpu_count()-2 if multiprocessing.cpu_count()-2 > 1 else 1

@Language.component("loc_label_filter")
def custom_component_function(doc):
    old_ents = doc.ents
    new_ents = [item for item in old_ents if item.label_ == "LOC"]
    doc.ents = new_ents
    return doc


nlp = spacy.load("./path/to/your/own/model")
nlp.add_pipe("loc_label_filter", after="ner")

sentences = ['sentence 1', 'sentence2', 'sentence3']

for doc in nlp.pipe(sentences, n_process=cpu_cores):
    print([(doc) for ent in doc.ents])

重要的：

請注意，如果您的sentences變量包含數百或數千個樣本，這些結果將會很明顯； 如果句子“小” （即它只包含一百個或更少的句子），您（和時間基准）可能不會注意到很大的差異。
另請注意， nlp.pipe中的batch_size參數也可以進行微調，但根據我自己的經驗，只有在使用前面的提示仍然看不到明顯差異的情況下，您才想這樣做。

如何使用 spaCy 預測多個句子的實體？

問題描述

1 個解決方案

解決方案1
1 已采納 2022-10-05 00:54:39

如何使用 spaCy 預測多個句子的實體？

問題描述

1 個解決方案

解決方案1 1 已采納 2022-10-05 00:54:39

解決方案1
1 已采納 2022-10-05 00:54:39