pandas groupby replace based on condition

Question

I have a dataset structures as below:

index country  city     Data
0     AU       Sydney   23
1     AU       Sydney   45
2     AU       Unknown  2
3     CA       Toronto  56
4     CA       Toronto  2
5     CA       Ottawa   1
6     CA       Unknown  2

I want to replace 'Unknown' in the city column with the mode of the occurences of cities per country. The result would be:

...
2     AU       Sydney  2
...
6     CA       Toronto  2

I can get the city modes with:

city_modes = df.groupby('country')['city'].apply(lambda x: x.mode().iloc[0])

And I can replace values with:

df['column']=df.column.replace('Unknown', 'something')

But i cant work out how to combine these to only replace unknowns for each country based on mode of occurrence of cities.

Any ideas?

Answer 1

Use transform for Series with same size as original DataFrame and set new values by numpy.where :

city_modes = df.groupby('country')['city'].transform(lambda x: x.mode().iloc[0])
df['column'] = np.where(df['column'] == 'Unknown',city_modes, df['column'])

Or:

df.loc[df['column'] == 'Unknown', 'column'] = city_modes

pandas groupby replace based on condition

Question

1 answers

solution1
2 ACCPTED 2018-09-24 12:33:06

pandas groupby replace based on condition

Question

1 answers

solution1 2 ACCPTED 2018-09-24 12:33:06

solution1
2 ACCPTED 2018-09-24 12:33:06