简体   繁体   中英

python replace string in a specific dataframe column

I would like to replace any string in a dataframe column by the string 'Chaudière', for any word that starts with the string "chaud". I would like the first and last name after each "Chaudiere" to disapper, to anonymize the NameDevice

My data frame is called df1 and the column name is NameDevice.

I have tried this:

   df1.loc[df['NameDevice'].str.startswith('chaud'), 'NameDevice'] = df1['NameDevice'].str.replace("chaud","Chaudière") . I check with df1.head(), it returns:   

IdDevice    IdDeviceType    SerialDevice    NameDevice  IdLocation  UuidAttributeDevice IdBox   IsUpdateDevice
0            119    48       00001         Chaudière Maud Ferrand   4   NaN 4   0
1            120    48       00002         Chaudière Yvan Martinod  6   NaN 6   0
2            121    48       00006         Chaudière Anne-Sophie Premereur  7   NaN 7   0
3            122    48       00005         Chaudière Denis Fauser   8   NaN 8   0
4            123    48       00004         Chaudière Elariak Djilali    3   NaN 3   0

You can do the matching by calling str.lower first, then you can use str.startswith , and then just split on the spaces and take the first entry to anonymise the data:

In [14]:
df.loc[df['NameDevice'].str.lower().str.startswith('chaud'), 'NameDevice'] = df['NameDevice'].str.split().str[0]
df

Out[14]:
   IdDevice  IdDeviceType  SerialDevice NameDevice  IdLocation  \
0       119            48             1  Chaudière           4   
1       120            48             2  Chaudière           6   
2       121            48             6  Chaudière           7   
3       122            48             5  Chaudière           8   
4       123            48             4  Chaudière           3   

   UuidAttributeDevice  IdBox  IsUpdateDevice  
0                  NaN      4               0  
1                  NaN      6               0  
2                  NaN      7               0  
3                  NaN      8               0  
4                  NaN      3               0  

Another method is to use str.extract so it only takes Chaud... :

In [27]:
df.loc[df['NameDevice'].str.lower().str.startswith('chaud'), 'NameDevice'] = df['NameDevice'].str.extract('(Chaud\w+ )', expand=False)
df

Out[27]:
   IdDevice  IdDeviceType  SerialDevice  NameDevice  IdLocation  \
0       119            48             1  Chaudière            4   
1       120            48             2  Chaudière            6   
2       121            48             6  Chaudière            7   
3       122            48             5  Chaudière            8   
4       123            48             4  Chaudière            3   

   UuidAttributeDevice  IdBox  IsUpdateDevice  
0                  NaN      4               0  
1                  NaN      6               0  
2                  NaN      7               0  
3                  NaN      8               0  
4                  NaN      3               0  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM