Identifying text using NLP

Question

I'm trying to find the courses in the below line of text using some NLP technique.

from nltk import word_tokenize, pos_tag, ne_chunk
sentence = "SDGI is offering courses like Electronics,Mechatronics, Physics,Mechanical Engineering"    
print ne_chunk(pos_tag(word_tokenize(sentence)))

Out put of this is

(S
  (ORGANIZATION SDGI/NNP)
  is/VBZ
  offering/VBG
  courses/NNS
  like/IN
  Electronics/NNS
  ,/,
  Mechatronics/NNS
  ,/,
  (PERSON Physics/NNPS)
  ,/,
  (PERSON Mechanical/NNP Engineering/NNP))

Is there any way I can extract the courses from this line?

In my real project I will be getting so many documents from which I need to get the course names.

Any help is appreciated!

Answer 1

Extract all the Nouns from a given text.
Create a Bag of Words feature set and train the set for courses with labeled data.
It seems the courses mostly precede or succeed a comma(,). A bigram or trigram approach could give accurate results.

Answer 2

This might be too simplistic, but, if there is is a finite number of existing course names, it might be easier just to create a large look up table, tokenize your input and try to look each word up. There will be some edge cases, but I'm not sure you need to take an ML/NLP approach to this problem.

Identifying text using NLP

Question

2 answers

solution1
0 2017-07-08 09:04:18

solution2
0 2017-08-04 20:27:13

Identifying text using NLP

Question

2 answers

solution1 0 2017-07-08 09:04:18

solution2 0 2017-08-04 20:27:13

solution1
0 2017-07-08 09:04:18

solution2
0 2017-08-04 20:27:13