LDA Gensim Word - >主题ID分布而不是主题 - >单词分发

Question

i am trying to implement Topic Tiling algorithm on my trained lda model. 我正在尝试在我训练的lda模型上实现Topic Tiling算法。 For the algorithm I need all of the IDs that are assigned to a single word in an unseen document. 对于算法，我需要在看不见的文档中分配给单个单词的所有ID。 I will then calculate the most frequent topic id for the given word and assign it as the mode of that word. 然后，我将计算给定单词的最常见主题ID，并将其指定为该单词的模式。

I am using the gensim lib so it is very easy to get topic->word dist, where the words are given with their probabilities. 我正在使用gensim lib，因此很容易获得topic-> word dist，其中的单词以其概率给出。 However how do I get "what topic(s) are/were assigned to a single world", meaning word->topic dists. 但是，我如何获得“分配给单个世界的主题”，意思是单词 - >主题列表。

Example:
s = "Banks are closed on Sunday"

Topic -> Word Dist from Gensim:
TopicTag -> Prob*Word
Topic 0 -> 0,3*Bank, 0,2*are
Topic 1 -> 0,2*closed, 0,1*Sunday
Topic 2 -> 0,4*Sunday, 0,3*on

What I want:
word -> TopicTag(Frequency that given word was assigned with the specified topic tag)
Banks -> Topic1(2), Topic2(2)
Closed -> Topic0(1),Topic1 (4)

Please also note that I am not interested in parsing the Topic -> Word Dist results from Gensim, I am interested in finding an accurate way that my model assigns (numerous) topic(s) to each individual word that will come in an unseen document. 还请注意，我对解析Gensim的主题 - > Word Dist结果不感兴趣，我有兴趣找到一种准确的方法，即我的模型将（众多）主题分配给每个单独的单词，这些主题将出现在一个看不见的文档中。

Thanks in advance. 提前致谢。

Answer 1

You can get the matrix of word-topic weights from lda_model.get_lambda() . 您可以从lda_model.get_lambda()获取单词主题权重矩阵。 See also this mailing list thread: https://groups.google.com/d/msg/gensim/6N9-Y5KVQu0/soFqkEopMWgJ 另请参阅此邮件列表主题： https ： //groups.google.com/d/msg/gensim/6N9-Y5KVQu0/soFqkEopMWgJ

Answer 2

I am also interested in knowing the answer. 我也有兴趣知道答案。 Although, you can get Topic -> Word Dist without parsing by: 虽然，您可以通过以下方式获取主题 - > Word Dist而不进行解析：

y = ldavar.state.getlambda()
for i in range(y.shape[0]):
    y[i] = y[i] / y[i].sum()

Now each row of y will give you word distribution for a topic 现在，y的每一行都会为您提供主题的单词分配

LDA Gensim Word - >主题ID分布而不是主题 - >单词分发

问题描述

2 个解决方案

解决方案1
2 2016-05-16 23:53:36

解决方案2
1 2015-10-01 04:42:37

LDA Gensim Word - &gt;主题ID分布而不是主题 - &gt;单词分发

问题描述

2 个解决方案

解决方案1 2 2016-05-16 23:53:36

解决方案2 1 2015-10-01 04:42:37

LDA Gensim Word - >主题ID分布而不是主题 - >单词分发

解决方案1
2 2016-05-16 23:53:36

解决方案2
1 2015-10-01 04:42:37