简体   繁体   中英

LDA Topic Models package

Fellows,

I am beginner in topic modeling. I am using topic models package in R. The function call is LDA(data, k).

I want to know what alpha and beta values are used? Also, which inference algorithm is used for parameter estimation? Variational EM or Gibbs?

Thanks

I found people usually set alpha = 20/T, where T is the number of topics, and beta=0.01.

Both variational EM and Gibbs sampling can be used for inference.

use ?LDA

LDA(x, k, method = "VEM", control = NULL, model = NULL, ...) 

So, you can specify the method.

 lda <- LDA(x, control = list(alpha = 0.1), k = 2)

You can also specify alpha in control option

The Distribution of topics is defined using Dirichlet, as a function of alpha parameter. There are multiple Dirichlets - one within document, another across documents in a corpus.

In basic LDA, one can set alpha that defines the Dirichlet distribution of topics among the corpus. alpha values typically used are 0.001, 0.01, 0.1, 1 etc.. (more often 1/K as someone mentioned).

If alpha is very small, you imply (setting prior) that on-average each document is likely to have fewer topics (extremes would be 1 topic or all topics). If you set alpha very low, the probability distributions (posterior) within each doc will be very skewed.

No matter what you set, fixing a single alpha implies on average each topic size (average of posterior probability) will be similar across documents averaged.

INSTEAD estimate alpha based on data.

Read "Rethinking LDA priors" and consider using GENSIM in python.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM