Latent Dirichlet Allocation using Gensim on more than one corpus

Question

I have two questions related to the usage of gensim for LDA.

1) How can I create a model using one corpus, save it and perhaps extend it later on another corpus by training the model on it ? Is it possible ?

2) Can LDA be used to classify an unseen document, or the model needs to be created again by including it in the corpus ? Is there an online way to do it and see the changes on the fly ?

I have a fairly basic understanding of LDA and have used it for Topic modeling on simple corpus using lda and gensim libraries. Please point out any conceptual inconsistencies in the question. Thanks !

Answer 1

I found this to be helpful. Gensim does allow for an extra corpus to be added(updated) to an existing LDA model. This module allows both LDA model estimation from a training corpus and inference of topic distribution on new, unseen documents. This is described here -

https://radimrehurek.com/gensim/models/ldamodel.html

Additionally, the algorithm is streamed and can process corpora larger than the RAM. It also has a multicore implementation to speed up the process.

lda = LdaModel(corpus, num_topics=10)

lda.update(other_corpus)

This is how the model can be updated.

Latent Dirichlet Allocation using Gensim on more than one corpus

Question

1 answers

solution1
1 ACCPTED 2015-06-05 22:11:41

Latent Dirichlet Allocation using Gensim on more than one corpus

Question

1 answers

solution1 1 ACCPTED 2015-06-05 22:11:41

solution1
1 ACCPTED 2015-06-05 22:11:41