简体   繁体   中英

pandas groupby replace based on condition

I have a dataset structures as below:

index country  city     Data
0     AU       Sydney   23
1     AU       Sydney   45
2     AU       Unknown  2
3     CA       Toronto  56
4     CA       Toronto  2
5     CA       Ottawa   1
6     CA       Unknown  2

I want to replace 'Unknown' in the city column with the mode of the occurences of cities per country. The result would be:

...
2     AU       Sydney  2
...
6     CA       Toronto  2

I can get the city modes with:

city_modes = df.groupby('country')['city'].apply(lambda x: x.mode().iloc[0])

And I can replace values with:

df['column']=df.column.replace('Unknown', 'something')

But i cant work out how to combine these to only replace unknowns for each country based on mode of occurrence of cities.

Any ideas?

Use transform for Series with same size as original DataFrame and set new values by numpy.where :

city_modes = df.groupby('country')['city'].transform(lambda x: x.mode().iloc[0])
df['column'] = np.where(df['column'] == 'Unknown',city_modes, df['column'])

Or:

df.loc[df['column'] == 'Unknown', 'column'] = city_modes 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM