简体   繁体   中英

How to extract particular word(s) from the list of sentences using Python NLP. These word(s) are Parts of Medical equipments

I want to extract Medical Equipment part names from the list of sentences. These sentences are recorded actions taken on a support request which might include replacement of a part or acknowledging a part is in a bad state.

Here are the sample sentences-

  1. Found [Some equipment part] on both side not working.
  2. Narrowed problem down to the [Some equipment part].
  3. Needed a [Some equipment part] replacement.
  4. Assisted with troubleshooting, found [Some equipment part] is most likely bad.
  5. [Some equipment part] won't go down, will order part.

I want to extract "[Some equipment part]" from the above sentences.

Things I've already tried- First, I've filtered the sentences using Sentiment Analysis. Considering only the ones which have a negative sentiment or have "replace" text in them.

  1. Using NLTK, After POS tagging using RegexpParser on a defined grammer = "NP: {<VB. ><NN. >+<NN. >+|<VB. > <NN. >+}"
  2. Using Spacy, After POS tagging and dependency, filtering based on Verb, Noun relationship - token.dep_ in ['dobj'] and token.pos_ == 'NOUN'

The above approaches gives me a lot of meaningless output. Please, let me know if there is anything which can be of help.

It sounds like you would benefit from looking at Named Entity Recognition (NER). I would be curious if SpaCy would be able to pick these out as PRODUCT entities.

You're likely going to need to train the Spacy Named Entity Recognition to label tokens as "Medical Equipment". That way, you can parse the text and locate the equipment based on the NER label.

This will require you produce some training data with the medical equipment entities specified. Skipping this step may be possible by looking for PRODUCT entities, but you will likely miss entities because your use case is more specific than the generic product's spacy is trained to detect.

Once you've trained the model to identify the new Medical Equipment entities, you can locate them via

import spacy
nlp = spacy.load('en_core_medicalner')
doc = nlp('some text')

for token in doc:
  if token.label_ == 'Medical Equipment':
    print('token {} is Medical Equipment'.format(token.text))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM