I'm trying to find the courses in the below line of text using some NLP technique.
from nltk import word_tokenize, pos_tag, ne_chunk
sentence = "SDGI is offering courses like Electronics,Mechatronics, Physics,Mechanical Engineering"
print ne_chunk(pos_tag(word_tokenize(sentence)))
Out put of this is
(S
(ORGANIZATION SDGI/NNP)
is/VBZ
offering/VBG
courses/NNS
like/IN
Electronics/NNS
,/,
Mechatronics/NNS
,/,
(PERSON Physics/NNPS)
,/,
(PERSON Mechanical/NNP Engineering/NNP))
Is there any way I can extract the courses from this line?
In my real project I will be getting so many documents from which I need to get the course names.
Any help is appreciated!
This might be too simplistic, but, if there is is a finite number of existing course names, it might be easier just to create a large look up table, tokenize your input and try to look each word up. There will be some edge cases, but I'm not sure you need to take an ML/NLP approach to this problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.