簡體   English   中英

如何使用 spaCy 從句子中刪除實體?

[英]How to remove an entity from a sentence with spaCy?

如何使用 spaCy 從句子中刪除實體? 我想隨機刪除 ORP、GPE、Money、Ordinal 或 Percent 實體。 例如,

唐納德·約翰·特朗普[人](生於 1946 年 6 月 14 日)[日期] 是美國第 45 任[序數]和現任總統[GPE]。 在進入政界之前,他是一名商人和電視名人。

現在我怎樣才能從這句話中刪除某個實體? 在此示例中,該函數選擇刪除第 45 個有序實體。

>>> sentence = 'Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality.'
>>> remove(sentence)
45th

請嘗試Spacy NER 和np.random.choice

import spacy
nlp = spacy.load("en_core_web_md")

sentence = 'Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality.'
doc = nlp(sentence)

ents = [e.text for e in doc.ents if e.label_ in ("NORP", "GPE", "MONEY", "ORDINAL","PERCENT")]
remove = lambda x: str(np.random.choice(x))
# expected output
remove(ents)
'45th'

如果您希望從句子文本中刪除隨機實體:

def remove_from_sentence(sentence):
    doc = nlp(sentence)
    with doc.retokenize() as retokenizer:
        for e in doc.ents:
            retokenizer.merge(doc[e.start:e.end])
    tok_pairs = [(tok.text, tok.whitespace_) for tok in doc]
    ents = [e.text for e in doc.ents if e.label_ in ("NORP", "GPE", "MONEY", "ORDINAL","PERCENT")]
    ent_to_remove = remove(ents)
    print(ent_to_remove)
    tok_pairs_out = [pair for pair in tok_pairs if pair[0] != ent_to_remove]
    return "".join(np.array(tok_pairs_out).ravel())

remove_from_sentence(sentence)

the United States
'Donald John Trump (born June 14, 1946) is the 45th and current president of . Before entering politics, he was a businessman and television personality.'

請詢問是否有不清楚的地方。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM