I have a dataset that looks like this (not the actual values, but just to get the idea of it):
id text group
1 what is the difference and why is it ... 2
2 let me introduce myself, first. 1
The length of the "text" column can be from one sentence to many sentences. What I'm trying to do is to summarize each text from the row and save the summarized text in a new column. I'm using gensim for summarization.
My desired output is as follows, and please disregard the content.
id text group text_summary
1 what is the difference and why is it ... 2 the difference between object a and b
2 let me introduce myself, first. 1 let me introduce myself, first.
Below is the code I used, but I'm getting the following error.
import gensim
from gensim.summarization import summarize
from gensim.summarization import keywords
for i in range(0, df.shape[0]):
text = df.iloc[i]['Answers']
if len(text) > 1:
df.loc[i, 'summary_answer'] = summarize(text)
else:
df.loc[i, 'summary_answer'] = text
I understand the problem, but my if/else
statement seems to not work in this case.
Your code should probably be more like this:
def summary_answer(text):
try:
return summarize(text)
except ValueError:
return text
df['summary_answer'] = df['Answers'].apply(summary_answer)
Edit: The above code was quick code to solve the original error, it returns the original text if the summarize
call raises an exception. You can of course add more complicated logic to the function if this one doesn't cut it. Some simple examples:
def summary_answer(text):
try:
if not isinstance(text,str):#data of wrong type
return 'not text'
ans = summarize(text)
if len(ans.split())>3:#summary must be longer than 3 words
return ans
except ValueError:
pass
return text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.