简体   繁体   中英

spacy remove only org and person names

I have written the below function which removes all named entities from text. How could I modify it to remove only org and person names? I don't want to remove 6 from $6 from below. Thanks

import spacy
sp = spacy.load('en_core_web_sm')
def NER_removal(text):
    document = sp(text)
    
    text_no_namedentities = []
    
    ents = [e.text for e in document.ents]
    for item in document:
        if item.text in ents:
            pass
        else:
            text_no_namedentities.append(item.text)
    return (" ".join(text_no_namedentities))


NER_removal("John loves to play at Sofi stadium at 6.00 PM and he earns $6")
'loves to play at stadium at 6.00 PM and he earns $'

I think item.ent_type_ will be useful here.

import spacy
sp = spacy.load('en_core_web_sm')
def NER_removal(text):
    document = sp(text)
    text_no_namedentities = []
    # define ent types not to remove
    ent_types_to_stay = ["MONEY"]
    ents = [e.text for e in document.ents]
    for item in document:
        # add condition to leave defined ent types
        if all((item.text in ents, item.ent_type_ not in ent_types_to_stay)):
            pass
        else:
            text_no_namedentities.append(item.text)
    return (" ".join(text_no_namedentities))

print(NER_removal("John loves to play at Sofi stadium at 6.00 PM and he earns $6"))
# loves to play at Sofi stadium at 6.00 PM and he earns $ 6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM