[英]Calculate coherence for non-gensim topic model
I've built a topic model, with:我建立了一个主题 model,其中:
To find the optimal number of topics, I want to calculate the coherence for a model.为了找到最佳主题数,我想计算 model 的连贯性。 However, I am only aware of Gensim
's Coherencemodel
, which seems to require a Gensim model as input.但是,我只知道Gensim
的Coherencemodel
,这似乎需要 Gensim model 作为输入。
Are there any other packages/implementations that I could use to calculate the coherence of a computed topic model?是否有任何其他包/实现可用于计算计算主题 model 的连贯性? Or, if it is indeed possible to use the Coherencemodel
without inputting a LDAmodel, could someone show me how to do that?或者,如果确实可以在不输入 LDA 模型的情况下使用Coherencemodel
,有人可以告诉我该怎么做吗?
Actually, you can do this with the Gensim package.实际上,您可以使用 Gensim package 做到这一点。
input_data = list of list with tokenized texts input_data = 带有标记化文本的列表列表
topics = list with top N words per topic主题 = 每个主题前 N 个单词的列表
import gensim.corpora as corpora
from gensim.models.coherencemodel import CoherenceModel
id2word = corpora.Dictionary(input_data)
corpus = [id2word.doc2bow(text) for text in input_data]
cm = CoherenceModel(topics=topics,texts = input_data,corpus=corpus, dictionary=id2word, coherence='c_v')
coherence = cm.get_coherence()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.