I have Trained and test the Naive Bayes Algorithm using a text and train data. Now i want to predict the topic of a single text file.
Here is my code,
#importing test, train data
import sklearn.datasets as skd
categories = ['business', 'entertainment','local', 'sports', 'world']
sinhala_train = skd.load_files('Cleant data\stemmed_filtered_sinhala-set1', categories= categories, encoding= 'utf-8')
sinhala_test = skd.load_files('Cleant data\stemmed_filtered_sinhala-set2',categories= categories, encoding= 'utf-8')
name_file = "adaderana_67571.txt"
A = open(name_file, encoding='utf-8')
new_file = A.read()
from sklearn.feature_extraction.text import CountVectorizer
count_vectorization = CountVectorizer()
train_data_tf = count_vectorization.fit_transform(sinhala_train.data)
train_data_tf.shape
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_trans = TfidfTransformer()
train_data_tfidf = tfidf_trans.fit_transform(train_data_tf)
train_data_tfidf.shape
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(train_data_tfidf, sinhala_train.target)
test_data_tf = count_vectorization.transform(sinhala_test.data)
test_data_tfidf = tfidf_trans.fit_transform(test_data_tf)
predicted = clf.predict(test_data_tfidf)
from sklearn import metrics
from sklearn.metrics import accuracy_score
print("Accuracy of the model:", accuracy_score(sinhala_test.target, predicted))
print(metrics.classification_report(sinhala_test.target, predicted, target_names=sinhala_test.target_names)),
metrics.confusion_matrix(sinhala_test.target, predicted)
And this is my output,
Accuracy of the model: 0.864
precision recall f1-score support
business 0.78 0.94 0.85 100
entertainment 0.95 0.86 0.90 100
local 0.89 0.65 0.75 100
sports 0.91 0.93 0.92 100
world 0.83 0.94 0.88 100
micro avg 0.86 0.86 0.86 500
macro avg 0.87 0.86 0.86 500
weighted avg 0.87 0.86 0.86 500
array([[94, 2, 4, 0, 0],
[ 2, 86, 2, 4, 6],
[19, 0, 65, 5, 11],
[ 1, 3, 1, 93, 2],
[ 5, 0, 1, 0, 94]], dtype=int64)
Now i want to predict the topic of the text file new_file
.
Can someone help me write the code to predict topic for this text file.
I solved my problem. This was the code i used to predict the topic.
docs_new1 = sinhala_test_1
docs_new = [docs_new1]
X_new_counts = count_vectorization.transform(docs_new)
X_new_tfidf = tfidf_trans.transform(X_new_counts)
predicted_topic = clf.predict(X_new_tfidf)
for doc, category in zip(docs_new, predicted_topic):
topic = ( sinhala_train.target_names[category])
return topic
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.