I'm working on natural language processing using spacy library in python. From input i get several sentences that i work seperatly using this
for sent in doc.sents:
for each sent i search for any named entity using.ents attribute. What i would like to achieve is replacing the initial "sent" with a new one where every named entity recognized is replaced on the initial sentence. Here an example:
First sentence: Apple is looking at buying U.K. startup for $1 billion
After replacing: ORG is looking at buying GPE startup for MONEY
Of course using a simple string.replace doesnt work since i would like to have a new spacy.Doc Any idea how to achieve this?
You may wish to try:
import spacy
nlp = spacy.load("en_core_web_md")
in_ = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(in_)
out = []
for sent in doc.sents:
sent_out = ""
for tok in sent:
ws = " " if tok.whitespace_ else ""
if tok.ent_type_:
sent_out += tok.ent_type_ + ws
else:
sent_out += tok.text + ws
out.append(sent_out)
print(out)
['ORG is looking at buying GPE startup for MONEYMONEY MONEY']
Note a peculiar pattern MONEYMONEY MONEY
where you have 3 entities: 2 of which are not separated by whitespace, and 1 is separated.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.