Python replace multiple string patterns in column

Question

I have a dataframe of multiple movies containing synopsis.

Title        Synopsis
Movie1       Old Macdonald had a farm         [Written by ABC rewrite] 
Movie2       Wheels on the bus                 (Source: Melon)
Movie3       Tayo the bus                      [Produced by Wills Garage]
Movie4       James and Giant Apple             (Source: Kismet)

I'd like to remove the trailing words that are not required for NLP such that I get a dataframe below

Title        Synopsis
Movie1       Old Macdonald had a farm         
Movie2       Wheels on the bus                
Movie3       Tayo the bus                      
Movie4       James and Giant Apple

I've tried the following code but my synopsis column ends up with some string like "0"Iodfosomhgooad,somh...\n1GaBauadFal..." Was wondering if how i could resolve this, appreciate any form of help, thank you.

removelist = [('[Written by]', '') ,('(Source:)', '')]
               
for old, new in removelist:
    df['Synopsis'] = re.sub(old, new, str(df['Synopsis']))

Answer 1

You can use

df['Synopsis'] = df['Synopsis'].str.replace(r'\s*(?:\[[^][]*]|\([^()]*\))\s*$', '')

See the regex demo .

Details :

\s* - zero or more whitespaces
(?:\[[^][]*]|$[^()]*$) - either
- \[[^][]*] - a [ , any zero or more chars other than [ and ] and then a ] char
- | - or
- $[^()]*$ - a ( , any zero or more chars other than ( and ) and then a ) char
\s* - zero or more whitespaces
$ - end of string.

Answer 2

You can use the regex replace method directly available to strings in Pandas DataFrames.

data['Synopsis'] = data['Synopsis'].str.replace('\[.*\]$|\(.*\)$','', regex=True)

match anything between [] at end of string

\[.*\]$

multiple string patterns

|

match anything between () at end of string

$.*$$

The result of your sample is:

                         Synopsis
Title                            
Movie1  Old Macdonald had a farm 
Movie2         Wheels on the bus 
Movie3              Tayo the bus 
Movie4     James and Giant Apple

Python replace multiple string patterns in column

Question

2 answers

solution1
1 ACCPTED 2021-02-10 13:23:28

solution2
0 2021-02-10 13:13:30

Python replace multiple string patterns in column

Question

2 answers

solution1 1 ACCPTED 2021-02-10 13:23:28

solution2 0 2021-02-10 13:13:30

solution1
1 ACCPTED 2021-02-10 13:23:28

solution2
0 2021-02-10 13:13:30