[英]Latent Dirichlet Allocation Implementation with Gensim
I am doing project about LDA topic modelling, i used gensim (python) to do that.我正在做关于 LDA 主题建模的项目,我使用 gensim (python) 来做到这一点。 I read some references and it said that to get the best model topic thera are two parameters we need to determine, the number of passes and the number of topic.
我阅读了一些参考资料,它说要获得最佳 model 主题,我们需要确定两个参数,传递次数和主题数。 Is that true?
真的吗? for the number of passes we will see at which point the passes are stable, for the number of topic we will see which topic that has the lowest value.
对于传递的数量,我们将看到传递稳定的点,对于主题的数量,我们将看到哪个主题具有最低值。
num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None
And is it necessary to use all the parameters in gensim library?是否有必要使用 gensim 库中的所有参数?
Good LDA models mostly depend on the number of topics.好的 LDA 模型主要取决于主题的数量。 The more passes, the more accurate the topic model will be (and also the longer it will take to train).
通过的次数越多,主题 model 就越准确(训练所需的时间也越长)。
Of course it is not necessary to use all the parameters.当然,不必使用所有参数。 Most of the time you will just pass the required arguments.
大多数情况下,您只会通过所需的 arguments。 To find the optimal number of topics, you can get the c_v coherence values and find the highest coherence over a given grid.
要找到最佳主题数,您可以获取 c_v 连贯性值并找到给定网格上的最高连贯性。 Generally coherence is a better metric than perplexity as it is more in line with human annotators.
通常,连贯性是比困惑度更好的度量,因为它更符合人类注释者。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.