简体   繁体   中英

Splitting one column's text based on another column's in Pandas dataframe

I have two columns in my dataframe, 'Subject' and 'Description'. I am trying to clean the Description column by splitting the data on the text from the Subject column, as it's contained in all rows of the Description.

Here's a snippet of the Subject Column:

Subject
1     Question about the program   
2  Technical issue with the site    

And the Description Column:

Description \
1  An HTML only email was received and a rough conversion is below. 
Please refer to the Emails related list for the HTML contents of the 
message. Question about the program Hello Hello I was wondering if there 
is going to be a product review coming up soon?

2  An HTML only email was received and a rough conversion is below. 
Please refer to the Emails related list for the HTML contents of the 
message. Technical issue with the site Reviews I received emails stating 
that I need to rewrite two of my reviews    

For example on row 1, I would like the split on 'Question about the program' in the 1st row of the Description Column and only capture the text after that string.

I have tried df['Description'] = df.apply(lambda x: x['Description'].split(x['Subject'], 1), axis=1)['Description'] but am having no luck and getting the error "TypeError: ('must be str or None, not float')" on an index that doesn't contain the title within the description. How can I handle the rows that don't contain this exact text while still splitting the ones that do?

Any help would be appreciated. Thank you.

I have also tried the suggested response and am given this error. IndexError: ('list index out of range', 'occurred at index 1')

You need to split the strings in df['Description'] with specific value in Subject and take the later portion after split.

df.apply(lambda x: x['Description'].split(x['Subject'])[1], axis=1)

Output:

0     Hello Hello I was wondering if there is going...
1     Reviews I received emails stating that I need...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM