简体   繁体   中英

Gensim LDA Coherence Score Nan

我创建了一个 Gensim LDA 模型,如本教程所示: https<\/a> :\/\/www.machinelearningplus.com\/nlp\/topic-modeling-gensim-python\/

lda_model = gensim.models.LdaMulticore(data_df['bow_corpus'], num_topics=10, id2word=dictionary, random_state=100, chunksize=100, passes=10, per_word_topics=True)

Solved! Coherence Model requires the original text, instead of the training corpus fed to LDA_Model - so when i ran this:

coherence_model_lda = CoherenceModel(model=lda_model, texts=data_df['corpus'].tolist(), dictionary=dictionary, coherence='c_v')
with np.errstate(invalid='ignore'):
    lda_score = coherence_model_lda.get_coherence()

I got a coherence score of: 0.462

Hope this helps someone else making the same mistake. Thanks!

The documentation ( https://radimrehurek.com/gensim/models/coherencemodel.html ) says to provide "Tokenized texts" (list of list of str) - these should be your texts split into individual words that are in the dictionary you provide to CoherenceModel. If you provide the full texts that are not tokenized, there are no entries in the lookup dictionary for the words.

it didn't work for me. tried so many things but coherence is still nan. anyone to help, please?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM