简体   繁体   中英

Process of performing NER (Named Enitity Recognition) - NLP

So I have texts that look like the one below:

He also may have recurrent seizures which should be treated with ativan IV or IM and do not neccessarily indicate patient needs to return to hospital unless they continue for greater than 5 minutes or he has multiple recurrent seizures or complications such as aspiration.

and also annotation files which are like:

T1 Reason 16 33 recurrent seizures

The above annotation tells the ID of the entity, the span (character position) and the entity itself. My goal is to do NER (Named Entity Recongnition) on the above data. From my research I know that I have to do BIO (Beginning, Inside and Outside) tagging on the data which will make my data look as follows:

O - also O - may O - have B - recurrent I - seizures

After the BIO tagging I want to use the data to get some word embeddings and input it to a classifier which will let me get the Entity types with the test data.

Is the process outline that I gave right or can anyone please explain how I can go about this problem?

The approach you mentioned will work, however a more reliable approach is using a statistical model based approach rather than BIO tagging. You might want to look into the spaCy library for NLP tasks like this. spaCy can make a prediction about whether a word (called a token in NLP terms) is an entity (and what kind, if yes) in a given sentence (called a document in NLP terms). In order to perform NER on your document using this library, you can go about it as follows:

# pip install spacy
# python -m spacy download en_core_web_sm

import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

# Process your document
text = ("He also may have recurrent seizures which should be treated with ativan IV or IM and do not neccessarily indicate patient needs to return to hospital unless they continue for greater than 5 minutes or he has multiple recurrent seizures or complications such as aspiration.")
doc = nlp(text)

# Find named entities in the document
for entity in doc:
    print(entity.text, entity.label_)

Do make sure to check this out to get an understanding of what you get as output as a result of processing the document. The dictionary for what each possible label represents can be found here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM