[英]Calculating optimal number of topics for topic modeling (LDA)
I am going to do topic modeling via LDA.我将通过 LDA 进行主题建模。 I run my commands to see the optimal number of topics.
我运行我的命令来查看最佳主题数量。 The output was as follows: It is a bit different from any other plots that I have ever seen.
output 如下:它与我见过的任何其他地块都有点不同。 Do you think it is okay?
你觉得可以吗? or it is better to use other algorithms rather than LDA.
或者最好使用其他算法而不是LDA。 It is worth mentioning that when I run my commands to visualize the topics-keywords for 10 topics, the plot shows 2 main topics and the others had almost a strong overlap.
值得一提的是,当我运行命令来可视化 10 个主题的主题关键字时,plot 显示了 2 个主要主题,其他主题几乎有很强的重叠。 Is there any valid range for coherence?
是否有任何有效的一致性范围?
Many thanks to share your comments as I am a beginner in topic modeling.非常感谢您分享您的评论,因为我是主题建模的初学者。
Shameless self-promotion: I suggest you use the OCTIS library: https://github.com/mind-Lab/octis It allows you to run different topic models and optimize their hyperparameters (also the number of topics) in order to select the best result.无耻的自我推销:我建议你使用 OCTIS 库: https://github.com/mind-Lab/octis它允许你运行不同的主题模型并优化它们的超参数(也就是主题的数量)以达到 select最好的结果。
There might be many reasons why you get those results.您获得这些结果的原因可能有很多。 But here some hints and observations:
但这里有一些提示和观察:
References: https://www.aclweb.org/anthology/2021.eacl-demos.31/参考文献: https://www.aclweb.org/anthology/2021.eacl-demos.31/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.