使用 Gensim LDA 对文本进行分类 Model

Question

For reference, I already looked at the following questions:作为参考，我已经看过以下问题：

I am looking to have my LDA model trained from Gensim classify a sentence under one of the topics that the model creates.我希望让我的 LDA model 从 Gensim 接受培训，根据 model 创建的主题之一对句子进行分类。 Something long the lines of长长的线条

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
for line in document: # where each line in the document is its own sentence for simplicity
    print('Sentence: ', line)
    topic = lda.parse(line) # where the classification would occur
    print('Topic: ', topic)

I know gensim does not have a parse function, but how would one go about accomplishing this?我知道 gensim 没有parse function，但是一个 go 如何完成这个呢？ Here is the documentation that I've been following but I haven't gotten anywhere with it:这是我一直在关注的文档，但我没有得到任何帮助：

https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py

Thanks in advance.提前致谢。

edit: More documentation- https://radimrehurek.com/gensim/models/ldamodel.html编辑：更多文档- https://radimrehurek.com/gensim/models/ldamodel.html

Answer 1

Let me get your problem right: You want to train a LDA Model on some documents an retrieve 7 topics.让我解决你的问题：你想在一些文档上训练 LDA Model 并检索 7 个主题。 Then you want to classify new documents in one (or more?) of these topics, meaning you want to infer topic distributions on new, unseen documents.然后你想在这些主题中的一个（或多个？）中对新文档进行分类，这意味着你想推断新的、未见过的文档的主题分布。

If so, the gensim documentation provides answers.如果是这样，gensim 文档提供了答案。

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
count = 1
for line in document: # where each line in the document is its own sentence for simplicity
    print('\nSentence: ', line)
    line = line.split()
    line_bow = id2word.doc2bow(line)
    doc_lda = lda[line_bow]
    print('\nLine ' + str(count) + ' assigned to Topic ' + str(max(doc_lda)[0]) + ' with ' + str(round(max(doc_lda)[1]*100,2)) + ' probability!')
    count += 1

使用 Gensim LDA 对文本进行分类 Model

问题描述

1 个解决方案

解决方案1
2 2020-06-29 06:58:35

使用 Gensim LDA 对文本进行分类 Model

问题描述

1 个解决方案

解决方案1 2 2020-06-29 06:58:35

解决方案1
2 2020-06-29 06:58:35