简体   繁体   English

Gensim 的潜在狄利克雷分配实现

[英]Latent Dirichlet Allocation Implementation with Gensim

I am doing project about LDA topic modelling, i used gensim (python) to do that.我正在做关于 LDA 主题建模的项目,我使用 gensim (python) 来做到这一点。 I read some references and it said that to get the best model topic thera are two parameters we need to determine, the number of passes and the number of topic.我阅读了一些参考资料,它说要获得最佳 model 主题,我们需要确定两个参数,传递次数和主题数。 Is that true?真的吗? for the number of passes we will see at which point the passes are stable, for the number of topic we will see which topic that has the lowest value.对于传递的数量,我们将看到传递稳定的点,对于主题的数量,我们将看到哪个主题具有最低值。

num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None 

And is it necessary to use all the parameters in gensim library?是否有必要使用 gensim 库中的所有参数?

Good LDA models mostly depend on the number of topics.好的 LDA 模型主要取决于主题的数量。 The more passes, the more accurate the topic model will be (and also the longer it will take to train).通过的次数越多,主题 model 就越准确(训练所需的时间也越长)。

Of course it is not necessary to use all the parameters.当然,不必使用所有参数。 Most of the time you will just pass the required arguments.大多数情况下,您只会通过所需的 arguments。 To find the optimal number of topics, you can get the c_v coherence values and find the highest coherence over a given grid.要找到最佳主题数,您可以获取 c_v 连贯性值并找到给定网格上的最高连贯性。 Generally coherence is a better metric than perplexity as it is more in line with human annotators.通常,连贯性是比困惑度更好的度量,因为它更符合人类注释者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 主题和潜在Dirichlet分配 - Topics and Latent Dirichlet Allocation 潜在狄利克雷分配与文档聚类的关系 - The relationship between latent Dirichlet allocation and documents clustering 文档分类的监督潜在狄利克雷分配? - Supervised Latent Dirichlet Allocation for Document Classification? 潜在狄利克雷分配 (LDA) 主题生成 - latent Dirichlet allocation (LDA) Topics generation Latent Dirichlet Allocation主题数量未知 - Latent Dirichlet Allocation where number of topics is unknown Spark Latent Dirichlet分配模型主题矩阵太小 - Spark Latent Dirichlet Allocation model topic matrix is too small 使用稀疏数据,训练 LDA(潜在狄利克雷分配)并预测新文档的更快方法是什么? - With sparse data,what is the faster way to train LDA( Latent Dirichlet allocation ) and predict for a new document? 提取后使用潜在Dirichlet分配的变换方法时出错 - Error when using transform method from Latent Dirichlet Allocation after unpickling 使用自动编码器重建潜在空间 - reconstruction latent space with autoencoder Dirichlet过程中的Dirac Delta质量点 - Mass Point, Dirac Delta in Dirichlet Processes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM