简体   繁体   English

从gensim获取主题的层次结构

[英]Get hierarchy of topic from gensim

Does gensim give us hierarchy of topics? gensim是否为我们提供了主题层次结构? I write a code to calculate topic of some documents, the output is words of each topic. 我写了一段代码来计算一些文档的主题,输出的是每个主题的单词。 But I want hierarchy of topics. 但是我想要主题的层次结构。 this is my code: 这是我的代码:

https://gist.github.com/anonymous/2e3b2f3866e5029c55c3 https://gist.github.com/anonymous/2e3b2f3866e5029c55c3

and this is output: 这是输出:

2014-06-16 13:02:22,540 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2014-06-16 13:02:38,162 : INFO : built Dictionary(324451 unique tokens: [u'considered,', u'\x00\x00', u'\ufb90\ufee0\ufbff\ufeea', u'\u0627\ufee7\ufed4\ufb91\ufe8e\u0643', u'\u0627\u0628\u0631\u0631\u0627\u06cc\u0627\u0646\u0647']...) from 885 documents (total 3885556 corpus positions)
2014-06-16 13:02:38,545 : INFO : storing corpus in Matrix Market format to corpus.mm
2014-06-16 13:02:38,546 : INFO : saving sparse matrix to corpus.mm
2014-06-16 13:02:38,554 : INFO : PROGRESS: saving document #0
2014-06-16 13:02:45,290 : INFO : saved 884x79405 matrix, density=0.514% (360672/70194020)
2014-06-16 13:02:45,292 : INFO : saving MmCorpus index to corpus.mm.index
2014-06-16 13:02:45,293 : INFO : loaded corpus index from corpus.mm.index
2014-06-16 13:02:45,293 : INFO : initializing corpus reader from corpus.mm
2014-06-16 13:02:45,293 : INFO : accepted corpus with 884 documents, 79405 features, 360672 non-zero entries
2014-06-16 13:03:06,913 : INFO : topic 0: 0.010*می + 0.006*دهند + 0.006*باره + 0.006*روی + 0.004*کسی + 0.004*بی + 0.004*مانند + 0.004*جز + 0.004*شود + 0.004*یکی + 0.004*چه + 0.004*اما + 0.004*دارد + 0.004*در + 0.004*بر + 0.004*آن + 0.004*او + 0.004*حتی + 0.004*که + 0.004*های
2014-06-16 13:03:07,097 : INFO : topic 1: 0.000*پست + 0.000*گرفت‌ + 0.000*Single + 0.000*ﺧﻨﻚ + 0.000*ﺑﻤﺎﻧﺪ + 0.000*حدودی + 0.000*352 + 0.000*«دين + 0.000*گروهي‌ + 0.000*ﺣﻔﺎ + 0.000*می​شود + 0.000*غنی + 0.000*   کشتي + 0.000*بستایم. + 0.000*19-20. + 0.000*67 + 0.000*تصرف + 0.000*مذاکرات» + 0.000*الات + 0.000*پسرش

Is there any way to get the hierarchy of topics? 有什么方法可以获取主题的层次结构?

Despite the name, HDP will not give you hierarchy of topics. 尽管名称如此,但HDP不会为您提供主题层次结构。 You need hierarchical LDA ( hLDA ) for that, which is not currently implemented in Gensim. 为此,您需要分层的LDA( hLDA ), Gensim当前未实现。 This algorithm has a C implementation from Blei Lab and a Java implementation (mallet.cs.umass.edu/api/cc/mallet/topics/HierarchicalLDA.html) in Mallet. 该算法具有Blei Lab的C 实现和Mallet中的Java实现(mallet.cs.umass.edu/api/cc/mallet/topics/HierarchicalLDA.html)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM