繁体 English 中英

潜在语义索引主题的数量

[英]Number of Latent Semantic Indexing topics

原文 2014-07-18 03:47:19 0 2 topic-modeling/ gensim/ latent-semantic-indexing

我正在使用gensim的软件包在语料库上实现LSI。 我的目标是找出出现在语料库中的最常出现的不同主题。

如果我不知道语料库中的主题数量（我估计在5到20之间），那么设置LSI应该搜索的主题数量的最佳方法是什么？ 寻找大量主题（20-30）或少数主题（~5）更好吗？

2 个解决方案

来自Radim本人：

这是一个很好的问题，但遗憾的是没有一个好的答案。

增加维度的数量总是提高检索准确性。 事实上，如果你使用所有维度（=训练矩阵的满级），LSI将为你提供与你输入的文件完全相同的文件，因此LSI将变得毫无意义。

如果您对它的数学方面感兴趣，请看一下这个问题： https ： //github.com/piskvorky/gensim/issues/28否则，只需将尺寸设置为几百到几千即可接受标准。 或者尝试几种不同的选择，测量准确度并选择最适合您问题的维度。

最好的，Radim

当我困惑时，这就是我有时会做的事情。 由于您已经从5-20缩小到主题，因此您可以迭代b / w其中一些值并查看哪个值最合适。

##Declare values for N_TOPICS
for i in lda.show_topics(topics=-N_TOPICS, topn=20, log=False, formatted=True): 
  print "TOPIC {0}: {1}\n".format(count, i)

主题和潜在Dirichlet分配

[英]Topics and Latent Dirichlet Allocation

主题建模 - 将具有前2个主题的文档指定为类别标签 - sklearn Latent Dirichlet Allocation

[英]Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation

主题模型中的动态主题数

[英]Dynamic number of topics in topic models

为主题建模 (LDA) 计算最佳主题数

[英]Calculating optimal number of topics for topic modeling (LDA)

使用LDA为大型语料库确定最佳主题数的快速方法

[英]Fast way to determine the optimal number of topics for a large corpus using LDA

HDP（分层Dirichilet流程）可以从数据中检测主题数吗？

[英]Can HDP (Hierarchical Dirichilet Process) detect the number of topics from the data?

使用 ldamulticore 确定 log_perplexity 以获得最佳主题数

[英]Determining log_perplexity using ldamulticore for optimum number of topics

潜在狄利克雷分配解决方案示例

[英]Latent Dirichlet Allocation Solution Example

R中潜在的Diriclichit分配

[英]Latent Diriclichit Allocation in R

如果我不知道主题的数量，我可以使用LDA主题建模吗？

[英]Can I use LDA topic modeling if I do not know the number of topics

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 主题和潜在Dirichlet分配主题建模 - 将具有前2个主题的文档指定为类别标签 - sklearn Latent Dirichlet Allocation 主题模型中的动态主题数为主题建模 (LDA) 计算最佳主题数使用LDA为大型语料库确定最佳主题数的快速方法 HDP（分层Dirichilet流程）可以从数据中检测主题数吗？使用 ldamulticore 确定 log_perplexity 以获得最佳主题数潜在狄利克雷分配解决方案示例 R中潜在的Diriclichit分配如果我不知道主题的数量，我可以使用LDA主题建模吗？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM