简体   繁体   中英

Latent Dirichlet Allocation using Gensim on more than one corpus

I have two questions related to the usage of gensim for LDA.

1) How can I create a model using one corpus, save it and perhaps extend it later on another corpus by training the model on it ? Is it possible ?

2) Can LDA be used to classify an unseen document, or the model needs to be created again by including it in the corpus ? Is there an online way to do it and see the changes on the fly ?

I have a fairly basic understanding of LDA and have used it for Topic modeling on simple corpus using lda and gensim libraries. Please point out any conceptual inconsistencies in the question. Thanks !

I found this to be helpful. Gensim does allow for an extra corpus to be added(updated) to an existing LDA model. This module allows both LDA model estimation from a training corpus and inference of topic distribution on new, unseen documents. This is described here -

https://radimrehurek.com/gensim/models/ldamodel.html

Additionally, the algorithm is streamed and can process corpora larger than the RAM. It also has a multicore implementation to speed up the process.

lda = LdaModel(corpus, num_topics=10)

lda.update(other_corpus)

This is how the model can be updated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM