LDA Topic Models package

Question

Fellows,

I am beginner in topic modeling. I am using topic models package in R. The function call is LDA(data, k).

I want to know what alpha and beta values are used? Also, which inference algorithm is used for parameter estimation? Variational EM or Gibbs?

Thanks

Answer 1

I found people usually set alpha = 20/T, where T is the number of topics, and beta=0.01.

Both variational EM and Gibbs sampling can be used for inference.

Answer 2

use ?LDA

LDA(x, k, method = "VEM", control = NULL, model = NULL, ...)

So, you can specify the method.

 lda <- LDA(x, control = list(alpha = 0.1), k = 2)

You can also specify alpha in control option

Answer 3

The Distribution of topics is defined using Dirichlet, as a function of alpha parameter. There are multiple Dirichlets - one within document, another across documents in a corpus.

In basic LDA, one can set alpha that defines the Dirichlet distribution of topics among the corpus. alpha values typically used are 0.001, 0.01, 0.1, 1 etc.. (more often 1/K as someone mentioned).

If alpha is very small, you imply (setting prior) that on-average each document is likely to have fewer topics (extremes would be 1 topic or all topics). If you set alpha very low, the probability distributions (posterior) within each doc will be very skewed.

No matter what you set, fixing a single alpha implies on average each topic size (average of posterior probability) will be similar across documents averaged.

INSTEAD estimate alpha based on data.

Read "Rethinking LDA priors" and consider using GENSIM in python.

LDA Topic Models package

Question

3 answers

solution1
0 2014-07-08 05:45:22

solution2
0 2015-08-11 08:01:56

solution3
0 2018-08-05 18:36:04

LDA Topic Models package

Question

3 answers

solution1 0 2014-07-08 05:45:22

solution2 0 2015-08-11 08:01:56

solution3 0 2018-08-05 18:36:04

solution1
0 2014-07-08 05:45:22

solution2
0 2015-08-11 08:01:56

solution3
0 2018-08-05 18:36:04