简体   繁体   English

Spacy NER:提取特定单词前的所有人

[英]Spacy NER: Extract all Persons before a specific word

I know I can use spacy entity named recognition to extract persons in a text.我知道我可以使用名为 recognition 的 spacy 实体来提取文本中的人物。 But I only want to extract the person or personS who are before the word "asked".但我只想提取“问”这个词之前的人或人。

Should I use Matcher together with NER?我应该将 Matcher 与 NER 一起使用吗? I am new to Spacy so apologies if the question is simple我是 Spacy 的新手,如果问题很简单,我深表歉意

Desired Output:所需 Output:
Louis Ng路易斯·吴

Current Output:当前Output:
Louis Ng路易斯·吴
Lam Pin Min蓝品敏


import spacy

nlp = spacy.load("en_core_web_trf")


doc = nlp (
    "Mr Louis Ng asked what kind of additional support can we give to sectors and businesses where the human interaction cannot be mechanised. Mr Lam Pin Min replied that businesses could hire extra workers in such cases."
    )

for ent in doc.ents:
    # Print the entity text and label
    print(ent.text, ent.label_)

You can use a Matcher to find PERSON entities that precede a specific word:您可以使用Matcher查找特定单词之前的PERSON实体:

pattern = [{"ENT_TYPE": "PERSON"}, {"ORTH": "asked"}]

Because each dict corresponds to a single token, this pattern would only match the last word of the entity ("Ng").因为每个字典都对应一个标记,所以这个模式只会匹配实体的最后一个词(“Ng”)。 You could let the first dict match more than one token with {"ENT_TYPE": "PERSON", "OP": "+"} , but this runs the risk of matching two person entities in a row in an example like "Before Ms X spoke to Ms Y Ms Z asked...".您可以让第一个 dict 与{"ENT_TYPE": "PERSON", "OP": "+"}匹配多个标记,但这会冒着在“Before Ms”之类的示例中连续匹配两个人实体的风险X 与 Y 女士交谈,Z 女士询问……”。

To be able to match a single entity more easily with a Matcher , you can add the component merge_entities to the end of your pipeline ( https://spacy.io/api/pipeline-functions#merge_entities ), which merges each entity into a single token.为了能够更轻松地使用Matcher单个实体,您可以将组件merge_entities添加到管道的末尾( https://spacy.io/api/pipeline-functions#merge_entities ),它将每个实体合并到一个单个令牌。 Then this pattern would match "Louis Ng" as one token.然后这个模式将匹配“Louis Ng”作为一个标记。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM