简体   繁体   English

"Gensim LDA 连贯性得分 Nan"

[英]Gensim LDA Coherence Score Nan

我创建了一个 Gensim LDA 模型,如本教程所示: https<\/a> :\/\/www.machinelearningplus.com\/nlp\/topic-modeling-gensim-python\/

lda_model = gensim.models.LdaMulticore(data_df['bow_corpus'], num_topics=10, id2word=dictionary, random_state=100, chunksize=100, passes=10, per_word_topics=True)

Solved!解决了! Coherence Model requires the original text, instead of the training corpus fed to LDA_Model - so when i ran this: Coherence Model 需要原始文本,而不是提供给 LDA_Model 的训练语料库 - 所以当我运行这个时:

coherence_model_lda = CoherenceModel(model=lda_model, texts=data_df['corpus'].tolist(), dictionary=dictionary, coherence='c_v')
with np.errstate(invalid='ignore'):
    lda_score = coherence_model_lda.get_coherence()

I got a coherence score of: 0.462我的连贯性得分为:0.462

Hope this helps someone else making the same mistake.希望这可以帮助其他人犯同样的错误。 Thanks!谢谢!

The documentation ( https://radimrehurek.com/gensim/models/coherencemodel.html ) says to provide "Tokenized texts" (list of list of str) - these should be your texts split into individual words that are in the dictionary you provide to CoherenceModel.文档( https://radimrehurek.com/gensim/models/coherencemodel.html )说提供“标记化文本”(str列表列表) - 这些应该是你的文本分成你提供的字典中的单个单词到 CoherenceModel。 If you provide the full texts that are not tokenized, there are no entries in the lookup dictionary for the words.如果您提供未标记化的全文,则查找词典中没有词的条目。

it didn't work for me.它对我不起作用。 tried so many things but coherence is still nan.尝试了很多东西,但连贯性仍然很差。 anyone to help, please?有人帮忙吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM