简体繁体 English

词频对Gensim LDA Topic建模有什么影响

[英]What is the impact of word frequency on Gensim LDA Topic modelling

原文 2020-03-14 05:56:44 1 1 python-3.x/ gensim/ lda/ topic-modeling/ word-frequency

I am trying to use Gensim LDA modelling to topic model of dataset of food recipes.我正在尝试将 Gensim LDA 建模用于食品食谱数据集的主题模型。 I wish to have topics based the key ingredients in the recipe.我希望有基于食谱中关键成分的主题。 But the recipe text contains more words that are generic English and are not ingredient names.但是配方文本包含更多通用英语而不是成分名称的单词。 Hence my topic outcome is not as good as expected.因此，我的主题结果没有预期的那么好。 I am trying to understand the impact of word frequency in the LDA topic outcome.我试图了解词频在 LDA 主题结果中的影响。 Thanks.谢谢。

1 个解决方案

Have you tried removing stop-words from the data on which you construct LDA model?您是否尝试从构建 LDA 模型的数据中删除停用词？

Also, please bear in mind that it is not really possible to influence the assignment of words among the topics.另外，请记住，实际上不可能影响主题之间的单词分配。 This has been discussed in the answer to this question: how to improve word assignement in different topics in lda这已经在这个问题的答案中讨论过：如何改进 lda 中不同主题中的词分配

Gensim LDA模型主题差异导致nan - Gensim LDA model topic diff resulting in nan

gensim LDA中导致主题重叠的常用词 - Common words that cause topic overlap in gensim LDA

使用 Gensim LDA 对文本进行分类 Model - Classify Text with Gensim LDA Model

可视化的主题建模输出 - visualization for output of topic modelling

Python LDA gensim“弃用警告：无效的转义序列” - Python LDA gensim "DeprecationWarning: invalid escape sequence"

文件编号将如何影响Gensim LDA的结果？ - How will the document number affect the result of Gensim LDA?

槌中的Python主题建模错误 - Python topic modelling error in mallet

如何获得Gensim LDA中所有文档的document_topics分布？ - How to get document_topics distribution of all of the document in gensim LDA?

通过corpora.csvcorpus在gensim LDA中输入CSV - CSV Input in gensim LDA via corpora.csvcorpus

有没有办法将Gensim LDA输出与pyLDAvis图中的主题相匹配？ - Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Gensim LDA模型主题差异导致nan - Gensim LDA model topic diff resulting in nan gensim LDA中导致主题重叠的常用词 - Common words that cause topic overlap in gensim LDA 使用 Gensim LDA 对文本进行分类 Model - Classify Text with Gensim LDA Model 可视化的主题建模输出 - visualization for output of topic modelling Python LDA gensim“弃用警告：无效的转义序列” - Python LDA gensim "DeprecationWarning: invalid escape sequence" 文件编号将如何影响Gensim LDA的结果？ - How will the document number affect the result of Gensim LDA? 槌中的Python主题建模错误 - Python topic modelling error in mallet 如何获得Gensim LDA中所有文档的document_topics分布？ - How to get document_topics distribution of all of the document in gensim LDA? 通过corpora.csvcorpus在gensim LDA中输入CSV - CSV Input in gensim LDA via corpora.csvcorpus 有没有办法将Gensim LDA输出与pyLDAvis图中的主题相匹配？ - Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM