简体   繁体   English

在每一行python中做gensim文本摘要

[英]Doing gensim text summarization in each row python

I have a dataset that looks like this (not the actual values, but just to get the idea of it):我有一个看起来像这样的数据集(不是实际值,只是为了了解它):

id  text                                      group 
1   what is the difference and why is it ...  2
2   let me introduce myself, first.           1 

The length of the "text" column can be from one sentence to many sentences. “文本”列的长度可以从一个句子到多个句子。 What I'm trying to do is to summarize each text from the row and save the summarized text in a new column.我想要做的是汇总行中的每个文本并将汇总的文本保存在新列中。 I'm using gensim for summarization.我正在使用 gensim 进行总结。

My desired output is as follows, and please disregard the content.我想要的输出如下,内容请无视。

id  text                                     group  text_summary 
1   what is the difference and why is it ...  2     the difference between object a and b 
2   let me introduce myself, first.           1     let me introduce myself, first.

Below is the code I used, but I'm getting the following error.下面是我使用的代码,但出现以下错误。

import gensim 
from gensim.summarization import summarize 
from gensim.summarization import keywords 

for i in range(0, df.shape[0]):
    text = df.iloc[i]['Answers']
    if len(text) > 1:
        df.loc[i, 'summary_answer'] = summarize(text)
    else: 
        df.loc[i, 'summary_answer'] = text

在此处输入图片说明

I understand the problem, but my if/else statement seems to not work in this case.我理解这个问题,但我的if/else语句在这种情况下似乎不起作用。

Your code should probably be more like this:您的代码应该更像这样:

def summary_answer(text):
    try:
        return summarize(text)
    except ValueError:
        return text
df['summary_answer'] = df['Answers'].apply(summary_answer)

Edit: The above code was quick code to solve the original error, it returns the original text if the summarize call raises an exception.编辑:上面的代码是解决原始错误的快速代码,如果summarize调用引发异常,则返回原始文本。 You can of course add more complicated logic to the function if this one doesn't cut it.如果这个函数没有删减它,你当然可以向函数添加更复杂的逻辑。 Some simple examples:一些简单的例子:

def summary_answer(text):
    try:
        if not isinstance(text,str):#data of wrong type
            return 'not text'
        ans = summarize(text)
        if len(ans.split())>3:#summary must be longer than 3 words
            return ans
    except ValueError:
        pass
    return text

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM