简体   繁体   中英

Remove specific characters from a pandas column?

Hello I have a dataframe where I want to remove a specific set of characters 'fwd' from every row that starts with it. The issue I am facing is that the code I am using to execute this is removing anything that starts with the letter 'f'.

my dataframe looks like this:

  summary 
0 Fwd: Please look at the attached documents and take action 
1 NSN for the ones who care
2 News for all team members 
3 Fwd: Please take action on the action needed items 
4 Fix all the mistakes please 

When i used the code:

df['Clean Summary'] =  individual_receivers['summary'].map(lambda x: x.lstrip('Fwd:'))

I end up with a dataframe that looks like this:

      summary 
0 Please look at the attached documents and take action 
1 NSN for the ones who care
2 News for all team members 
3 Please take action on the action needed items 
4 ix all the mistakes please 

I don't want the last row to lose the F in 'Fix'.

You should use a regex remembering ^ indicates startswith:

df['Clean Summary'] = df['Summary'].str.replace('^Fwd','')

Here's an example:

df = pd.DataFrame({'msg':['Fwd: o','oe','Fwd: oj'],'B':[1,2,3]})
df['clean_msg'] = df['msg'].str.replace(r'^Fwd: ','')
print(df)

Output:

       msg  B clean_msg
0   Fwd: o  1         o
1       oe  2        oe
2  Fwd: oj  3        oj

You are not only loosing 'F' but also 'w' , 'd' , and ':' . This is the way lstrip works - it removes all of the combinations of characters in the passed string.

You should actually use x.replace('Fwd:', '', 1)

1 - ensures that only the first occurrence of the string is removed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM