简体   繁体   中英

How to evaluate NMF Topic Modeling by using Confusion Matrix?

I am doing topic modeling using NMF model. I want to evaluate its performance by confusion matrix or if there are other better methods to evaluate NMF, I am ok with that also. I tried to find tutorials or other resources on internet but couldn't find anything that help me solve my problem. Below is the complete code which I am using for NMF topic modeling.

import pandas as pd
import numpy as np

dataset = pd.read_csv(r'Preprocess_Data.csv')
dataset = reviews_datasets.head(20000)
dataset.dropna()

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import metrics

tfidf_vect = TfidfVectorizer(max_df=0.8, min_df=2, stop_words='english')
doc_term_matrix = tfidf_vect.fit_transform(dataset['Text'].values.astype('U'))


from sklearn.decomposition import NMF

nmf = NMF(n_components=5, random_state=42)
nmf.fit(doc_term_matrix)

import random

for i in range(10):
    random_id = random.randint(0,len(tfidf_vect.get_feature_names()))
    print(tfidf_vect.get_feature_names()[random_id])

first_topic = nmf.components_[0]
top_topic_words = first_topic.argsort()[-10:]


for i in top_topic_words:
    print(tfidf_vect.get_feature_names()[I])

for i,topic in enumerate(nmf.components_):
    print(f'Top 10 words for topic #{i}:')
    print([tfidf_vect.get_feature_names()[i] for i in topic.argsort()[-10:]])
    print('\n')

Thanks in advance for the suggestions and advices.

If you have labels associated with documents, then you can train a classifier using the topic-document representations as document features and test on the topic-document representations of the testing set.

Otherwise, you need to stick to unsupervised metrics, eg the most well-known is topic coherence which measures how related the top-N words of the topics are.

You can find all these measures and many others here: https://github.com/mind-Lab/octis#available-metrics

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM