Classify sentences using controlled vocabularies with python

Question

I have several different medical vocabularies (such as medication, symptoms, signs, diseases), and some free-text diagnostic reports. I want to use tfidf or machine learning techniques to first break down the free text and then classify the important sentences into different categories. Python as a programming language For example, “patients need to take aspirin” are classified as “medication use”, and “aspirin” can be found in the medication vocabulary. Can you please recommend some algorithms for me? Thank you :)

Answer 1

I would suggest you to use CountVectorizer as you already have the list of keywords. In CountVectorizer there is a parameter to set Vocabulary. You can stick to your list of keywords as Vocabulary. So what CountVectorizer will do is check the document for those keywords and build a feature vector on basis of those keywords. Lets look at the example

from sklearn.feature_extraction.text import CountVectorizer
keywords=["aspirin","medication","patients"]
sen1="patients need to take aspirin"
sen2 = "medication required immediately"
vectorizer = CountVectorizer(vocabulary=keywords) 
corpus=[sen1,sen2]
X = vectorizer.transform(corpus)

After this when you print feature names of vectorizer:- print(vectorizer.get_feature_names()) You will see ['aspirin', 'medication', 'patients']

And when you see the vectors for each sentence by print(X.toarray()) you will see following matrix:- [[1 0 1][0 1 0]] So it has built a vector on basis of presence(1) and absence(0) of the keywords

Classify sentences using controlled vocabularies with python

Question

1 answers

solution1
0 2018-11-05 08:25:24

Classify sentences using controlled vocabularies with python

Question

1 answers

solution1 0 2018-11-05 08:25:24

solution1
0 2018-11-05 08:25:24