简体   繁体   English

python替换特定数据框列中的字符串

[英]python replace string in a specific dataframe column

I would like to replace any string in a dataframe column by the string 'Chaudière', for any word that starts with the string "chaud". 我想将数据框列中的任何字符串替换为字符串“Chaudière”,以任何以字符串“ chaud”开头的单词。 I would like the first and last name after each "Chaudiere" to disapper, to anonymize the NameDevice 我希望分解每个“ Chaudiere”之后的名字和姓氏,以使NameDevice匿名

My data frame is called df1 and the column name is NameDevice. 我的数据帧称为df1,列名称为NameDevice。

I have tried this: 我已经试过了:

   df1.loc[df['NameDevice'].str.startswith('chaud'), 'NameDevice'] = df1['NameDevice'].str.replace("chaud","Chaudière") . I check with df1.head(), it returns:   

IdDevice    IdDeviceType    SerialDevice    NameDevice  IdLocation  UuidAttributeDevice IdBox   IsUpdateDevice
0            119    48       00001         Chaudière Maud Ferrand   4   NaN 4   0
1            120    48       00002         Chaudière Yvan Martinod  6   NaN 6   0
2            121    48       00006         Chaudière Anne-Sophie Premereur  7   NaN 7   0
3            122    48       00005         Chaudière Denis Fauser   8   NaN 8   0
4            123    48       00004         Chaudière Elariak Djilali    3   NaN 3   0

You can do the matching by calling str.lower first, then you can use str.startswith , and then just split on the spaces and take the first entry to anonymise the data: 您可以通过首先调用str.lower进行匹配,然后可以使用str.startswith ,然后仅在空格上split并获取第一个条目以对数据进行匿名处理:

In [14]:
df.loc[df['NameDevice'].str.lower().str.startswith('chaud'), 'NameDevice'] = df['NameDevice'].str.split().str[0]
df

Out[14]:
   IdDevice  IdDeviceType  SerialDevice NameDevice  IdLocation  \
0       119            48             1  Chaudière           4   
1       120            48             2  Chaudière           6   
2       121            48             6  Chaudière           7   
3       122            48             5  Chaudière           8   
4       123            48             4  Chaudière           3   

   UuidAttributeDevice  IdBox  IsUpdateDevice  
0                  NaN      4               0  
1                  NaN      6               0  
2                  NaN      7               0  
3                  NaN      8               0  
4                  NaN      3               0  

Another method is to use str.extract so it only takes Chaud... : 另一种方法是使用str.extract因此只需要Chaud...

In [27]:
df.loc[df['NameDevice'].str.lower().str.startswith('chaud'), 'NameDevice'] = df['NameDevice'].str.extract('(Chaud\w+ )', expand=False)
df

Out[27]:
   IdDevice  IdDeviceType  SerialDevice  NameDevice  IdLocation  \
0       119            48             1  Chaudière            4   
1       120            48             2  Chaudière            6   
2       121            48             6  Chaudière            7   
3       122            48             5  Chaudière            8   
4       123            48             4  Chaudière            3   

   UuidAttributeDevice  IdBox  IsUpdateDevice  
0                  NaN      4               0  
1                  NaN      6               0  
2                  NaN      7               0  
3                  NaN      8               0  
4                  NaN      3               0  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM