简体   繁体   中英

Doing gensim text summarization in each row python

I have a dataset that looks like this (not the actual values, but just to get the idea of it):

id  text                                      group 
1   what is the difference and why is it ...  2
2   let me introduce myself, first.           1 

The length of the "text" column can be from one sentence to many sentences. What I'm trying to do is to summarize each text from the row and save the summarized text in a new column. I'm using gensim for summarization.

My desired output is as follows, and please disregard the content.

id  text                                     group  text_summary 
1   what is the difference and why is it ...  2     the difference between object a and b 
2   let me introduce myself, first.           1     let me introduce myself, first.

Below is the code I used, but I'm getting the following error.

import gensim 
from gensim.summarization import summarize 
from gensim.summarization import keywords 

for i in range(0, df.shape[0]):
    text = df.iloc[i]['Answers']
    if len(text) > 1:
        df.loc[i, 'summary_answer'] = summarize(text)
    else: 
        df.loc[i, 'summary_answer'] = text

在此处输入图片说明

I understand the problem, but my if/else statement seems to not work in this case.

Your code should probably be more like this:

def summary_answer(text):
    try:
        return summarize(text)
    except ValueError:
        return text
df['summary_answer'] = df['Answers'].apply(summary_answer)

Edit: The above code was quick code to solve the original error, it returns the original text if the summarize call raises an exception. You can of course add more complicated logic to the function if this one doesn't cut it. Some simple examples:

def summary_answer(text):
    try:
        if not isinstance(text,str):#data of wrong type
            return 'not text'
        ans = summarize(text)
        if len(ans.split())>3:#summary must be longer than 3 words
            return ans
    except ValueError:
        pass
    return text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM