计算非 gensim 主题 model 的连贯性

Question

I've built a topic model, with:我建立了一个主题 model，其中：

Input : list of tokenized lists输入：标记化列表的列表
Output : a m xt matrix (with each cell indicating the probability of word i appearing in topic k ). Output ：一个m xt矩阵（每个单元格表示单词i出现在主题k中的概率）。
Output : a k xn matrix (with each cell indicating the probability of topic k in document j ). Output ：一个k xn矩阵（每个单元格表示文档j中主题k的概率）。

To find the optimal number of topics, I want to calculate the coherence for a model.为了找到最佳主题数，我想计算 model 的连贯性。 However, I am only aware of Gensim 's Coherencemodel , which seems to require a Gensim model as input.但是，我只知道Gensim的Coherencemodel ，这似乎需要 Gensim model 作为输入。

Are there any other packages/implementations that I could use to calculate the coherence of a computed topic model?是否有任何其他包/实现可用于计算计算主题 model 的连贯性？ Or, if it is indeed possible to use the Coherencemodel without inputting a LDAmodel, could someone show me how to do that?或者，如果确实可以在不输入 LDA 模型的情况下使用Coherencemodel ，有人可以告诉我该怎么做吗？

Answer 1

Actually, you can do this with the Gensim package.实际上，您可以使用 Gensim package 做到这一点。

input_data = list of list with tokenized texts input_data = 带有标记化文本的列表列表

topics = list with top N words per topic主题 = 每个主题前 N 个单词的列表

import gensim.corpora as corpora
from gensim.models.coherencemodel import CoherenceModel

id2word = corpora.Dictionary(input_data)
corpus = [id2word.doc2bow(text) for text in input_data]

cm = CoherenceModel(topics=topics,texts = input_data,corpus=corpus, dictionary=id2word, coherence='c_v')
coherence = cm.get_coherence()

计算非 gensim 主题 model 的连贯性

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-31 00:56:00

计算非 gensim 主题 model 的连贯性

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-31 00:56:00

解决方案1
1 已采纳 2021-03-31 00:56:00