简体   繁体   English

gensim LDA 培训

[英]gensim LDA training

I am working with gensim LDA model for a project.我正在为一个项目使用 gensim LDA model。 I cant seem to find a proper number of topics.我似乎找不到合适数量的主题。 My question is, just to be sure, every time I train the model it re-starts, right?我的问题是,可以肯定的是,每次我训练 model 时它都会重新启动,对吗? For example, I try it out with 47 topics, terrible results;例如,我尝试了 47 个主题,结果很糟糕; so then I go back to the cell and change 47 to 80 topics and run it again.所以然后我 go 回到单元格并将 47 更改为 80 个主题并再次运行它。 It completely starts a new training and erases what it has learned with the 47 topics, right?它完全开始了新的训练,并抹去了它在 47 个主题中学到的东西,对吧?

I am having terrible results with LDA, similarity comes to 100% or 0% and I am having trouble parameter tuning.我的 LDA 结果很糟糕,相似度达到 100% 或 0%,而且我在参数调整方面遇到了麻烦。 LSI has given me excellent results. LSI 给了我很好的结果。 Thanks!谢谢!

Yes, every time you train LDA, it forgets what it has learned so far.是的,每次你训练 LDA 时,它都会忘记迄今为止学到的东西。

Some suggestions and comments that may help you to get better results:一些可以帮助您获得更好结果的建议和意见:

  • Make sure that you've preprocessed the text appropriately.确保您已对文本进行了适当的预处理。 This usually includes removing punctuation and numbers, removing stopwords and words that are too frequent or rare, (optionally) lemmatizing the text.这通常包括删除标点符号和数字,删除过于频繁或罕见的停用词和单词,(可选)对文本进行词形还原。 Preprocessing is dependent on the language and the domain of the texts.预处理取决于文本的语言和领域。
  • About the hyperparameters, you can use the "auto" mode for alpha and beta, letting the model learn the best values of alpha and beta.关于超参数,您可以对 alpha 和 beta 使用“自动”模式,让 model 学习 alpha 和 beta 的最佳值。 If you want to fix them, usually values lower than 1 are suggested.如果要修复它们,通常建议使用低于 1 的值。 Check this 检查这个
  • LDA is a probabilistic model, which means that if you re-train it with the same hyperparameters, you will get different results each time. LDA是一个概率model,这意味着如果你用相同的超参数重新训练它,你每次都会得到不同的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM