"Gensim LDA 连贯性得分 Nan"

Question

我创建了一个 Gensim LDA 模型，如本教程所示： https<\/a> :\/\/www.machinelearningplus.com\/nlp\/topic-modeling-gensim-python\/

lda_model = gensim.models.LdaMulticore(data_df['bow_corpus'], num_topics=10, id2word=dictionary, random_state=100, chunksize=100, passes=10, per_word_topics=True)

Answer 1

Solved!解决了！ Coherence Model requires the original text, instead of the training corpus fed to LDA_Model - so when i ran this: Coherence Model 需要原始文本，而不是提供给 LDA_Model 的训练语料库 - 所以当我运行这个时：

coherence_model_lda = CoherenceModel(model=lda_model, texts=data_df['corpus'].tolist(), dictionary=dictionary, coherence='c_v')
with np.errstate(invalid='ignore'):
    lda_score = coherence_model_lda.get_coherence()

I got a coherence score of: 0.462我的连贯性得分为：0.462

Hope this helps someone else making the same mistake.希望这可以帮助其他人犯同样的错误。 Thanks!谢谢！

Answer 2

The documentation ( https://radimrehurek.com/gensim/models/coherencemodel.html ) says to provide "Tokenized texts" (list of list of str) - these should be your texts split into individual words that are in the dictionary you provide to CoherenceModel.文档（ https://radimrehurek.com/gensim/models/coherencemodel.html ）说提供“标记化文本”（str列表列表） - 这些应该是你的文本分成你提供的字典中的单个单词到 CoherenceModel。 If you provide the full texts that are not tokenized, there are no entries in the lookup dictionary for the words.如果您提供未标记化的全文，则查找词典中没有词的条目。

Answer 3

it didn't work for me.它对我不起作用。 tried so many things but coherence is still nan.尝试了很多东西，但连贯性仍然很差。 anyone to help, please?有人帮忙吗？

"Gensim LDA 连贯性得分 Nan"

问题描述

2 个解决方案

解决方案1
8 已采纳 2020-02-16 08:45:14

解决方案2
0 2021-06-02 15:13:43

解决方案3
-2 2022-01-09 18:16:16

"Gensim LDA 连贯性得分 Nan"

问题描述

2 个解决方案

解决方案1 8 已采纳 2020-02-16 08:45:14

解决方案2 0 2021-06-02 15:13:43

解决方案3 -2 2022-01-09 18:16:16

解决方案1
8 已采纳 2020-02-16 08:45:14

解决方案2
0 2021-06-02 15:13:43

解决方案3
-2 2022-01-09 18:16:16