简体   繁体   中英

Latent Dirichlet Allocation Implementation with Gensim

I am doing project about LDA topic modelling, i used gensim (python) to do that. I read some references and it said that to get the best model topic thera are two parameters we need to determine, the number of passes and the number of topic. Is that true? for the number of passes we will see at which point the passes are stable, for the number of topic we will see which topic that has the lowest value.

num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None 

And is it necessary to use all the parameters in gensim library?

Good LDA models mostly depend on the number of topics. The more passes, the more accurate the topic model will be (and also the longer it will take to train).

Of course it is not necessary to use all the parameters. Most of the time you will just pass the required arguments. To find the optimal number of topics, you can get the c_v coherence values and find the highest coherence over a given grid. Generally coherence is a better metric than perplexity as it is more in line with human annotators.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM