Gensim LDA Coherence Score Nan

Question

我创建了一个 Gensim LDA 模型，如本教程所示： https<\/a> :\/\/www.machinelearningplus.com\/nlp\/topic-modeling-gensim-python\/

lda_model = gensim.models.LdaMulticore(data_df['bow_corpus'], num_topics=10, id2word=dictionary, random_state=100, chunksize=100, passes=10, per_word_topics=True)

Answer 1

Solved! Coherence Model requires the original text, instead of the training corpus fed to LDA_Model - so when i ran this:

coherence_model_lda = CoherenceModel(model=lda_model, texts=data_df['corpus'].tolist(), dictionary=dictionary, coherence='c_v')
with np.errstate(invalid='ignore'):
    lda_score = coherence_model_lda.get_coherence()

I got a coherence score of: 0.462

Hope this helps someone else making the same mistake. Thanks!

Answer 2

The documentation ( https://radimrehurek.com/gensim/models/coherencemodel.html ) says to provide "Tokenized texts" (list of list of str) - these should be your texts split into individual words that are in the dictionary you provide to CoherenceModel. If you provide the full texts that are not tokenized, there are no entries in the lookup dictionary for the words.

Answer 3

it didn't work for me. tried so many things but coherence is still nan. anyone to help, please?

Gensim LDA Coherence Score Nan

Question

2 answers

solution1
8 ACCPTED 2020-02-16 08:45:14

solution2
0 2021-06-02 15:13:43

solution3
-2 2022-01-09 18:16:16

Gensim LDA Coherence Score Nan

Question

2 answers

solution1 8 ACCPTED 2020-02-16 08:45:14

solution2 0 2021-06-02 15:13:43

solution3 -2 2022-01-09 18:16:16

solution1
8 ACCPTED 2020-02-16 08:45:14

solution2
0 2021-06-02 15:13:43

solution3
-2 2022-01-09 18:16:16