简体   繁体   中英

Remove char/str in one col as a condition for removing different str from another col - DF Pandas

I have a Dataframe (let's call it my_df ) with two columns.

initiating an example:

my_df = pd.DataFrame({'first_col':['theTable','aChair','Lamp','intheCup','aBottle','theGlass'],'second_col':['itisBig','isSmall','itisBright','itisDark','isRed', 'itisWhite']})

gives:

   first_col  second_col
0  theTable   itisBig
1  aChair     isSmall
2  Lamp       itisBright
3  intheCup   itisDark
4  aBottle    isRed
5  theGlass   itisWhite

I would like to remove the letters ' the ' from the beginning of each string in the first_col . In addition, If and only if this condition fulfills, the letters ' it ' should be removed from the beginning of each string (in the same row) in the second_col

the result should be such that only rows 0, 5 would be affected where 'the' & 'it' were removed from first and second columns correspondingly:

   first_col   second_col
0   Table      isBig
1   aChair     isSmall
2   Lamp       itisBright
3   intheCup   itisDark
4   aBottle    isRed
5   Glass      isWhite

NOTE that row 2 & 3 were not changed in second_col (stays: "itisBright" / "itisDark"), because the condition that "the" occurs in the first_col doesn't fulfill.

so far I know how to remove each condition "the" & "it separately:

my_df['first_col'] = my_df['first_col'].str.replace('the','')
my_df['second_col'] = my_df['second_col'].str.replace('it','')

but this is no good! because there is no dependency here.

does anybody know how to apply the above mentioned conditions so these strings would be removed simultaneously and dependently using PANDAS?

You were on the right track. Basically you just need to create a boolean filter about which rows you want to modify and then apply those modifications to only those rows.

import pandas as pd

my_df = pd.DataFrame({'first_col':['theTable','aChair','Lamp','intheCup','aBottle','theGlass'],'second_col':['itisBig','isSmall','itisBright','itisDark','isRed', 'itisWhite']})

changes = my_df['first_col'].str.startswith('the')

my_df.loc[changes, 'first_col'] = my_df.loc[changes, 'first_col'].str.replace('the','')
my_df.loc[changes, 'second_col'] = my_df.loc[changes, 'second_col'].str.replace('it','')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM