简体   繁体   English

潜在狄利克雷分配 (LDA) 主题生成

[英]latent Dirichlet allocation (LDA) Topics generation

Recently i have been following https://github.com/noahweber1/datacamp-project-The-Hottest-Topics-in-Machine-Learning/blob/master/notebook.ipynb to understand more on LDA.最近我一直在关注https://github.com/noahweber1/datacamp-project-The-Hottest-Topics-in-Machine-Learning/blob/master/notebook.ipynb以了解更多关于 LDA 的信息。 Basically it use LDA to find the hottest topic in Machine Learning from the papers.csv (NIP paper)基本上它使用LDA从papers.csv(NIP论文)中找到机器学习中最热门的话题

What confused me is the last output, the topic found via LDA.令我困惑的是最后一个输出,即通过 LDA 找到的主题。

LDA输出

  • Topic #0 for example is for which document/row from the papers.csv?例如,主题 #0 是来自 paper.csv 的哪个文档/行?
  • Are all those word in Topic is interconnected ? Topic 中的所有单词都是相互关联的吗?
  • Is the word appear in all those Topic are the hottest topic or only Topic #0 is the hottest topic ?这个词出现在所有这些话题中是最热门的话题还是只有话题 #0 是最热门的话题?
  • The topic created is not a sentence right ?创建的话题不是一句话吧?

I have found the answer.我找到了答案。

  1. Topic #0 for example is for which document/row from the papers.csv?例如,主题 #0 是来自 paper.csv 的哪个文档/行?

Topics are just “categories”.主题只是“类别”。 You need to define it.你需要定义它。

  1. Are all those word in Topic is interconnected ? Topic 中的所有单词都是相互关联的吗?

Yes they are related.that's how they are generated.是的,它们是相关的。这就是它们的生成方式。

  1. Is the word appear in all those Topic are the hottest topic or only Topic #0 is the hottest topic ?这个词出现在所有这些话题中是最热门的话题还是只有话题 #0 是最热门的话题?

It will not tell you which is the Hottest topic but generally Topic #0 is the answer in this case as it related to all documents它不会告诉你哪个是最热门的话题,但通常话题 #0 是这种情况下的答案,因为它与所有文档有关

  1. The topic created is not a sentence right ?创建的话题不是一句话吧?

No, the model generate the word.不,模型会生成单词。

More understand on the concept can be found here .更多关于概念的理解可以在这里找到。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark 中的潜在狄利克雷分配 (LDA) - Latent Dirichlet allocation (LDA) in Spark 通过限制语料库文档的字长来实现潜在的Dirichlet分配(LDA)性能 - Latent Dirichlet allocation(LDA) performance by limiting word size for Corpus Documents 我可以将LDA(潜在狄利克雷分配)应用于其他语言语料库吗? - Can I apply LDA (latent dirichlet allocation) to a different language corpus? 来自`gensim`的LDA(潜在狄利克雷分配)推理如何对新数据起作用? - How does LDA (Latent Dirichlet Allocation) inference from `gensim` work for a new data? 调试潜在的Dirichlet分配实现 - Debug a Latent Dirichlet Allocation implementation 使用Gensim进行潜在Dirichlet分配 - Using Latent Dirichlet Allocation with Gensim 如何实现Latent Dirichlet分配以在主题而不是unigrams中给出bigrams / trigrams - How to implement Latent Dirichlet Allocation to give bigrams/trigrams in topics instead of unigrams 主题建模 - 将具有前2个主题的文档指定为类别标签 - sklearn Latent Dirichlet Allocation - Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation gensim潜在Dirichlet分配minimum_probability vs print_topics - gensim Latent Dirichlet Allocation minimum_probability vs print_topics 具有先前主题词的潜在狄利克雷分配 - Latent Dirichlet Allocation with prior topic words
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM