简体   繁体   English

词频对Gensim LDA Topic建模有什么影响

[英]What is the impact of word frequency on Gensim LDA Topic modelling

I am trying to use Gensim LDA modelling to topic model of dataset of food recipes.我正在尝试将 Gensim LDA 建模用于食品食谱数据集的主题模型。 I wish to have topics based the key ingredients in the recipe.我希望有基于食谱中关键成分的主题。 But the recipe text contains more words that are generic English and are not ingredient names.但是配方文本包含更多通用英语而不是成分名称的单词。 Hence my topic outcome is not as good as expected.因此,我的主题结果没有预期的那么好。 I am trying to understand the impact of word frequency in the LDA topic outcome.我试图了解词频在 LDA 主题结果中的影响。 Thanks.谢谢。

Have you tried removing stop-words from the data on which you construct LDA model?您是否尝试从构建 LDA 模型的数据中删除停用词

Also, please bear in mind that it is not really possible to influence the assignment of words among the topics.另外,请记住,实际上不可能影响主题之间的单词分配。 This has been discussed in the answer to this question: how to improve word assignement in different topics in lda这已经在这个问题的答案中讨论过: 如何改进 lda 中不同主题中的词分配

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM