简体繁体 English

用 MALLET 训练的 LDA 模型的奇怪困惑值

[英]Strange perplexity values of LDA model trained with MALLET

原文 2017-04-23 23:33:00 6 1 java/ statistics/ lda/ topic-modeling/ mallet

I have trained an LDA model with MALLET on parts of the Stack Overflow data dump and did a 70/30 split for training and test data.我已经在 Stack Overflow 数据转储的一部分上使用 MALLET 训练了一个 LDA 模型，并对训练和测试数据进行了 70/30 的拆分。

But the perplexity values are strange, because they are lower for the test set than for the training set.但是困惑度值很奇怪，因为测试集的困惑度低于训练集。 How is this possible?这怎么可能？ I thought the model is better fitted for the training data?我认为该模型更适合训练数据？

I have already double checked my perplexity calculations, but I do not find an error.我已经仔细检查了我的困惑度计算，但我没有发现错误。 Do you have any idea what the reason could be?你知道可能是什么原因吗？

Thank you in advance!先感谢您！

Edit:编辑：

Instead of using the console output for the LL/token values of the training set, I have used the evaluator on the training set again.我没有对训练集的 LL/token 值使用控制台输出，而是再次在训练集上使用了评估器。 Now the values seem to be plausible.现在这些值似乎是合理的。

1 个解决方案

That makes sense.那讲得通。 The LL/token number is giving you the probability of both topic assignments and the observed words, whereas the held-out probability is giving you the marginal probability of just the observed words, summed over topics. LL/token 数为您提供了主题分配和观察到的单词的概率，而保留概率为您提供了仅观察到的单词的边际概率，在主题上求和。

从Mallet的LDA模型中获取单词主题矩阵 - Getting the word-topic-matrix from LDA-model in Mallet

使用训练有素的MALLET主题模型从相关主题中提取关键字 - Extracting keywords from relevant topics using a trained MALLET Topic model

Java Mallet LDA 关键字分布 - Java Mallet LDA keyword distributions

Mallet LDA主题建模中的空白主题 - empty topics in Mallet LDA topic modeling

使用Java中的Mallet折叠（估计新文档的主题）在LDA中 - Folding in (estimating topics for new documents) in LDA using Mallet in Java

计算电子邮件分类的语言模型的困惑度 - Calculating the perplexity of a language model for email classification

如何加载和使用受Mallet训练的CRF？ - How do I load and use a CRF trained with Mallet?

槌状特征选择类似于将特征值设置为0 - Mallet Feature Selection similar to setting feature values to 0

OpenNLP保存训练有素的模型 - OpenNLP save a trained model

如何加载经过训练的RandomForestClassificationModel模型？ - How to load trained RandomForestClassificationModel model?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Mallet的LDA模型中获取单词主题矩阵 - Getting the word-topic-matrix from LDA-model in Mallet 使用训练有素的MALLET主题模型从相关主题中提取关键字 - Extracting keywords from relevant topics using a trained MALLET Topic model Java Mallet LDA 关键字分布 - Java Mallet LDA keyword distributions Mallet LDA主题建模中的空白主题 - empty topics in Mallet LDA topic modeling 使用Java中的Mallet折叠（估计新文档的主题）在LDA中 - Folding in (estimating topics for new documents) in LDA using Mallet in Java 计算电子邮件分类的语言模型的困惑度 - Calculating the perplexity of a language model for email classification 如何加载和使用受Mallet训练的CRF？ - How do I load and use a CRF trained with Mallet? 槌状特征选择类似于将特征值设置为0 - Mallet Feature Selection similar to setting feature values to 0 OpenNLP保存训练有素的模型 - OpenNLP save a trained model 如何加载经过训练的RandomForestClassificationModel模型？ - How to load trained RandomForestClassificationModel model?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM