简体   繁体   中英

Pandas convert name where ID is the same and the name is different?

I am trying to clean up name values where I have the following situation.

     ID  name
1     1    Company
2     1    Company, LLC

I would like to normalize it so I have only one name like so:

     ID  name
1     1    Company
2     1    Company

This will keep the first element of each group and broadcast it along the entire size of your dataframe:

df
Out[22]: 
   ID         name
0   1      Company
1   1  Company,LLC
2   2   Companybbb
3   2  Company,LLC
4   3   Companyccc
5   3  Company,LLC

df.groupby('ID')['name'].transform('first')
Out[21]: 
0       Company
1       Company
2    Companybbb
3    Companybbb
4    Companyccc
5    Companyccc
Name: name, dtype: object

For your example:

df.loc[df.name == 'Company, LLC', 'name'] = 'Company'

You can use this same method repeatedly to remap a sequence of values. As mentioned by MattR , FuzzyWuzzy can help you find strings that are likely candidates for being identical, if you want to identify more potential matches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM