简体   繁体   English

主题建模:如何使用我的拟合LDA模型来预测R中新数据集的新主题?

[英]Topic Modeling: How do I use my fitted LDA model to predict new topics for a new dataset in R?

I am using 'lda' package in R for topic modeling. 我在R中使用'lda'包进行主题建模。 I want to predict new topics(collection of related words in a document) using a fitted Latent Dirichlet Allocation(LDA) model for new dataset. 我想使用针对新数据集的拟合Latent Dirichlet分配(LDA)模型来预测新主题(文档中相关单词的集合)。 In the process, I came across predictive.distribution() function. 在这个过程中,我遇到了predictive.distribution()函数。 But the function takes document_sums as input parameter which is an output of the result after fitting the new model. 但该函数将document_sums作为输入参数,它是拟合新模型后的结果输出。 I need help to understand the use of existing model on new dataset and predict topics. 我需要帮助来理解在新数据集上使用现有模型并预测主题。 Here is the example code present in the documentation written by Johnathan Chang for the package: Here is the code for it: 以下是Johnathan Chang为该软件包编写的文档中的示例代码:以下是代码:

#Fit a model
data(cora.documents)
data(cora.vocab)

K <- 10 ## Num clusters

result <- lda.collapsed.gibbs.sampler(cora.documents,K, cora.vocab,25, 0.1, 0.1) 

# Predict new words for the first two documents
predictions <-  predictive.distribution(result$document_sums[,1:2], result$topics, 0.1, 0.1)

# Use top.topic.words to show the top 5 predictions in each document.
top.topic.words(t(predictions), 5)

Any help will be appreciated 任何帮助将不胜感激

Thanks & Regards, 感谢和问候,

Ankit ANKIT

I don't know how you can achieve this in R but please have a look at a 2009 publication by Wallach et. 我不知道你怎么能在R中实现这个目标,但请看看Wallach等人2009年的出版物。 al. 人。 titled 'Evaluation Methods for Topic Models' here . 题为“为主题模型评价方法” 在这里 Have a look at section 4, it mentions three methods to calculate P(z|w), one based on importance sampling and other two called 'Chib-style estimator' and 'left-to-right estimator'. 看看第4节,它提到了三种计算P(z | w)的方法,一种基于重要性抽样,另外两种称为“Chib式估算器”和“从左到右估算器”。

Mallet has implementation of left-to-right estimator method Mallet实现了从左到右的估算方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM