简体   繁体   中英

How to calculate perplexity for LDA with Gibbs sampling

I perform an LDA topic model in R on a collection of 200+ documents (65k words total). The documents have been preprocessed and are stored in the document-term matrix dtm . Theoretically, I should expect to find 5 distinct topics in the corpus, but I would like to calculate the perplexity score and see how the model fit changes with the number of topics. Below is the code I use. The problem is it gives me an error when i try to calculate the perplexity score and I am not sure how to fix it (I am new to R). The error is in the last line of code. I would appreciate any help.

burnin <- 4000  #burn-in parameter
iter <- 2000    # #of iteration after burn-in
thin <- 500     #take every 500th iteration for further use to avoid correlations between samples
seed <-list(2003,10,100,10005,765)
nstart <- 5     #use 5 different starting points
best <- TRUE    #return results of the run with the highest posterior probability

#Number of topics (run the algorithm for different values of k and make a choice based by inspecting the results)
k <- 5

#Run LDA using Gibbs sampling
ldaOut <-LDA(dtm,k, method="Gibbs", 
             control=list(nstart=nstart, seed = seed, best=best, 
                          burnin = burnin, iter = iter, thin=thin))

 perplexity(ldaOut, newdata = dtm)

Error in method(x, k, control, model, mycall, ...) : Need 1 seeds

It needs one more parameter "estimate_theta",

use below code:

perplexity(ldaOut, newdata = dtm,estimate_theta=FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM