简体   繁体   中英

How does LDA (Latent Dirichlet Allocation) inference from `gensim` work for a new data?

I am training my ldamodel using gensim , and predicting using a test corpus like this ldamodel[doc_term_matrix_test] , it works just fine but I don't understand how the prediction is actually done using the trained model (what ldamodel[doc_term_matrix_test] is doing).

Here is the code :

dictionary2 = corpora.Dictionary(test)
dictionary = corpora.Dictionary(train)
dictionary.merge_with(dictionary2)
doc_term_matrix2 = [dictionary.doc2bow(doc) for doc in test]
doc_term_matrix = [dictionary.doc2bow(doc) for doc in train]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=2, id2word = 
dictionary,random_state=100, iterations=50, passes=1)
topics = sorted(ldamodel[doc_term_matrix2],
                key=lambda 
                x:x[1],
                reverse=True)

To quote from gensim docs about ldamodel :

This module allows both LDA model estimation from a training corpus and inference of topic distribution on new, unseen documents.

So apparently, what your code does is not quite "prediction" but rather inference. That is, your trained LDA model yields for every test document T an estimation of the topic distribution of T .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM