简体   繁体   English

使用 Gensim LDA 对文本进行分类 Model

[英]Classify Text with Gensim LDA Model

For reference, I already looked at the following questions:作为参考,我已经看过以下问题:

  1. Gensim LDA for text classification 用于文本分类的 Gensim LDA
  2. Python Gensim LDA Model show_topics funciton Python Gensim LDA Model show_topics 函数

I am looking to have my LDA model trained from Gensim classify a sentence under one of the topics that the model creates.我希望让我的 LDA model 从 Gensim 接受培训,根据 model 创建的主题之一对句子进行分类。 Something long the lines of长长的线条

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
for line in document: # where each line in the document is its own sentence for simplicity
    print('Sentence: ', line)
    topic = lda.parse(line) # where the classification would occur
    print('Topic: ', topic)

I know gensim does not have a parse function, but how would one go about accomplishing this?我知道 gensim 没有parse function,但是一个 go 如何完成这个呢? Here is the documentation that I've been following but I haven't gotten anywhere with it:这是我一直在关注的文档,但我没有得到任何帮助:

https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py

Thanks in advance.提前致谢。

edit: More documentation- https://radimrehurek.com/gensim/models/ldamodel.html编辑:更多文档- https://radimrehurek.com/gensim/models/ldamodel.html

Let me get your problem right: You want to train a LDA Model on some documents an retrieve 7 topics.让我解决你的问题:你想在一些文档上训练 LDA Model 并检索 7 个主题。 Then you want to classify new documents in one (or more?) of these topics, meaning you want to infer topic distributions on new, unseen documents.然后你想在这些主题中的一个(或多个?)中对新文档进行分类,这意味着你想推断新的、未见过的文档的主题分布。

If so, the gensim documentation provides answers.如果是这样,gensim 文档提供了答案。

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
count = 1
for line in document: # where each line in the document is its own sentence for simplicity
    print('\nSentence: ', line)
    line = line.split()
    line_bow = id2word.doc2bow(line)
    doc_lda = lda[line_bow]
    print('\nLine ' + str(count) + ' assigned to Topic ' + str(max(doc_lda)[0]) + ' with ' + str(round(max(doc_lda)[1]*100,2)) + ' probability!')
    count += 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM