如何初始化gensim LDA主题模型？

Question

It has been suggested that initializing a topic model using clusters of words can lead to higher quality models or more robust (consistent) inference. 已经提出，使用单词簇来初始化主题模型可以导致更高质量的模型或更健壮（一致）的推断。 I am talking about initializing the optimizer, not setting a prior. 我说的是初始化优化器，而不是设置先验。 Here is some code to illustrate what I want to do: 这是一些代码来说明我想做什么：

Create an LdaModel object, but don't pass in a corpus. 创建一个LdaModel对象，但不要传递语料库。

lda_model =
LdaModel(
         id2word=id2word,
         num_topics=30,
         eval_every=10,
         pass=40,
         iterations=5000)

Next assign some property of the object, corresponding to the probabilities of drawing each word from a topic to a matrix of my own construction. 接下来，分配对象的某些属性，对应于将每个单词从主题绘制到我自己的结构矩阵中的可能性。

lda_model.topics = my_topic_mat

Then fit the corpus: 然后适合语料库：

lda_model.update(corpus)

Thanks for the help! 谢谢您的帮助！

Answer 1

In practice, setting a prior may be a better choice than initializing the optimizer. 实际上，设置优先级可能比初始化优化程序更好。

There are two hyperparameters alpha and eta , where alpha is a prior for the document-topic matrix and eta is a prior for the topic-word matrix. 有两个超参数alpha和eta ，其中alpha是文档主题矩阵的先验，而eta是主题词矩阵的先验。 To influence word probabilities in topics, try passing eta as an additional argument: 要影响主题中的单词概率，请尝试将eta作为附加参数传递：

lda_model = gensim.models.ldamodel.LdaModel(num_topics=30, id2word=id2word, eta=your_topic_mat, 
                                            eval_every=10, iterations=5000)

From the gensim docs : 从gensim文档：

eta can be a scalar for a symmetric prior over topic/word distributions, or a vector of shape num_words, which can be used to impose (user defined) asymmetric priors over the word distribution. eta可以是主题/单词分布上的对称先验的标量，或者是num_words形状的向量，可用于对单词分布施加（用户定义的）非对称先验。 It also supports the special value 'auto', which learns an asymmetric prior over words directly from your data. 它还支持特殊值“ auto”，该值直接从您的数据中学习单词的不对称先验。 eta can also be a matrix of shape num_topics x num_words, which can be used to impose asymmetric priors over the word distribution on a per-topic basis (can not be learned from data). eta也可以是形状为num_topics x num_words的矩阵，可用于在每个主题的基础上对单词分布施加非对称先验（无法从数据中获知）。

如何初始化gensim LDA主题模型？

问题描述

1 个解决方案

解决方案1
0 2018-06-30 20:24:51

如何初始化gensim LDA主题模型？

问题描述

1 个解决方案

解决方案1 0 2018-06-30 20:24:51

解决方案1
0 2018-06-30 20:24:51