[英]How to remove an entity from a sentence with spaCy?
How to remove an entity from a sentence with spaCy?如何使用 spaCy 从句子中删除实体? I want to remove ORP, GPE, Money, Ordinal, or Percent entity randomly.
我想随机删除 ORP、GPE、Money、Ordinal 或 Percent 实体。 For example,
例如,
Donald John Trump[person] (born June 14, 1946)[date] is the 45th[ordinal] and current president of the United States[GPE].
唐纳德·约翰·特朗普[人](生于 1946 年 6 月 14 日)[日期] 是美国第 45 任[序数]和现任总统[GPE]。 Before entering politics, he was a businessman and television personality.
在进入政界之前,他是一名商人和电视名人。
Now how can I remove a certain entity form this sentence?现在我怎样才能从这句话中删除某个实体? In this example, the function chose to remove 45th, an ordinal entity.
在此示例中,该函数选择删除第 45 个有序实体。
>>> sentence = 'Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality.'
>>> remove(sentence)
45th
Please try Spacy
NER together with np.random.choice
:请尝试
Spacy
NER 和np.random.choice
:
import spacy
nlp = spacy.load("en_core_web_md")
sentence = 'Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality.'
doc = nlp(sentence)
ents = [e.text for e in doc.ents if e.label_ in ("NORP", "GPE", "MONEY", "ORDINAL","PERCENT")]
remove = lambda x: str(np.random.choice(x))
# expected output
remove(ents)
'45th'
Should you wish to remove a random entity from sentence text:如果您希望从句子文本中删除随机实体:
def remove_from_sentence(sentence):
doc = nlp(sentence)
with doc.retokenize() as retokenizer:
for e in doc.ents:
retokenizer.merge(doc[e.start:e.end])
tok_pairs = [(tok.text, tok.whitespace_) for tok in doc]
ents = [e.text for e in doc.ents if e.label_ in ("NORP", "GPE", "MONEY", "ORDINAL","PERCENT")]
ent_to_remove = remove(ents)
print(ent_to_remove)
tok_pairs_out = [pair for pair in tok_pairs if pair[0] != ent_to_remove]
return "".join(np.array(tok_pairs_out).ravel())
remove_from_sentence(sentence)
the United States
'Donald John Trump (born June 14, 1946) is the 45th and current president of . Before entering politics, he was a businessman and television personality.'
Please ask if something is not clear.请询问是否有不清楚的地方。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.