简体   繁体   中英

named entity recognition with spacy

I'm working on natural language processing using spacy library in python. From input i get several sentences that i work seperatly using this

for sent in doc.sents:

for each sent i search for any named entity using.ents attribute. What i would like to achieve is replacing the initial "sent" with a new one where every named entity recognized is replaced on the initial sentence. Here an example:

First sentence: Apple is looking at buying U.K. startup for $1 billion
After replacing: ORG is looking at buying GPE startup for MONEY

Of course using a simple string.replace doesnt work since i would like to have a new spacy.Doc Any idea how to achieve this?

You may wish to try:

import spacy

nlp = spacy.load("en_core_web_md")   
in_ = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(in_)
out = []
for sent in doc.sents:
    sent_out = ""
    for tok in sent:
        ws = " " if tok.whitespace_ else ""
        if tok.ent_type_:
            sent_out += tok.ent_type_ + ws
        else:
            sent_out += tok.text + ws
    out.append(sent_out)
    
print(out)

['ORG is looking at buying GPE startup for MONEYMONEY MONEY']

Note a peculiar pattern MONEYMONEY MONEY where you have 3 entities: 2 of which are not separated by whitespace, and 1 is separated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM