Pandas convert name where ID is the same and the name is different?

Question

I am trying to clean up name values where I have the following situation.

     ID  name
1     1    Company
2     1    Company, LLC

I would like to normalize it so I have only one name like so:

     ID  name
1     1    Company
2     1    Company

Answer 1

This will keep the first element of each group and broadcast it along the entire size of your dataframe:

df
Out[22]: 
   ID         name
0   1      Company
1   1  Company,LLC
2   2   Companybbb
3   2  Company,LLC
4   3   Companyccc
5   3  Company,LLC

df.groupby('ID')['name'].transform('first')
Out[21]: 
0       Company
1       Company
2    Companybbb
3    Companybbb
4    Companyccc
5    Companyccc
Name: name, dtype: object

Answer 2

For your example:

df.loc[df.name == 'Company, LLC', 'name'] = 'Company'

You can use this same method repeatedly to remap a sequence of values. As mentioned by MattR , FuzzyWuzzy can help you find strings that are likely candidates for being identical, if you want to identify more potential matches.

Pandas convert name where ID is the same and the name is different?

Question

2 answers

solution1
2 2017-03-03 20:09:28

solution2
0 2017-03-03 20:06:24

Pandas convert name where ID is the same and the name is different?

Question

2 answers

solution1 2 2017-03-03 20:09:28

solution2 0 2017-03-03 20:06:24

solution1
2 2017-03-03 20:09:28

solution2
0 2017-03-03 20:06:24