简体   繁体   中英

Python replace multiple string patterns in column

I have a dataframe of multiple movies containing synopsis.

Title        Synopsis
Movie1       Old Macdonald had a farm         [Written by ABC rewrite] 
Movie2       Wheels on the bus                 (Source: Melon)
Movie3       Tayo the bus                      [Produced by Wills Garage]
Movie4       James and Giant Apple             (Source: Kismet)

I'd like to remove the trailing words that are not required for NLP such that I get a dataframe below

Title        Synopsis
Movie1       Old Macdonald had a farm         
Movie2       Wheels on the bus                
Movie3       Tayo the bus                      
Movie4       James and Giant Apple            

I've tried the following code but my synopsis column ends up with some string like "0"Iodfosomhgooad,somh...\n1GaBauadFal..." Was wondering if how i could resolve this, appreciate any form of help, thank you.

removelist = [('[Written by]', '') ,('(Source:)', '')]
               
for old, new in removelist:
    df['Synopsis'] = re.sub(old, new, str(df['Synopsis']))



You can use

df['Synopsis'] = df['Synopsis'].str.replace(r'\s*(?:\[[^][]*]|\([^()]*\))\s*$', '')

See the regex demo .

Details :

  • \s* - zero or more whitespaces
  • (?:\[[^][]*]|\([^()]*\)) - either
    • \[[^][]*] - a [ , any zero or more chars other than [ and ] and then a ] char
    • | - or
    • \([^()]*\) - a ( , any zero or more chars other than ( and ) and then a ) char
  • \s* - zero or more whitespaces
  • $ - end of string.

You can use the regex replace method directly available to strings in Pandas DataFrames.

data['Synopsis'] = data['Synopsis'].str.replace('\[.*\]$|\(.*\)$','', regex=True)

match anything between [] at end of string

\[.*\]$

multiple string patterns

|

match anything between () at end of string

\(.*\)$

The result of your sample is:

                         Synopsis
Title                            
Movie1  Old Macdonald had a farm 
Movie2         Wheels on the bus 
Movie3              Tayo the bus 
Movie4     James and Giant Apple 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM