I am trying to clean up name values where I have the following situation.
ID name
1 1 Company
2 1 Company, LLC
I would like to normalize it so I have only one name like so:
ID name
1 1 Company
2 1 Company
This will keep the first element of each group and broadcast it along the entire size of your dataframe:
df
Out[22]:
ID name
0 1 Company
1 1 Company,LLC
2 2 Companybbb
3 2 Company,LLC
4 3 Companyccc
5 3 Company,LLC
df.groupby('ID')['name'].transform('first')
Out[21]:
0 Company
1 Company
2 Companybbb
3 Companybbb
4 Companyccc
5 Companyccc
Name: name, dtype: object
For your example:
df.loc[df.name == 'Company, LLC', 'name'] = 'Company'
You can use this same method repeatedly to remap a sequence of values. As mentioned by MattR , FuzzyWuzzy can help you find strings that are likely candidates for being identical, if you want to identify more potential matches.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.