简体   繁体   中英

How to replace NaN values with another column's mean based on value in another column? Pandas

I have got a dataframe of game releases and ratings

name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8.0,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,

I want to fill NaN values in user_score column with the mean of the same genre. If a game has sports genre and in that row user_score is NaN i want replace the null value with sport's average user rating.

This data has had the user_score of the second sports game removed so that we can demonstrate the code.

name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,

Looking at the user score of the Wii Sports Resort

df.iloc[3]['user_score']

nan

Replacing NaN with the mean of the user_score by genre

df['user_score'] = df.groupby('genre')['user_score'].transform(lambda x: x.fillna(x.mean()))

Checking the output of the same game after the update

df.iloc[3]['user_score']

8.0

One possible solution is to create a dictionary genre_avg of genre average ratings and then substitute NAs in user_score according to this dictionary

genre_avg = data.groupby(['genre']).agg({'user_score': 'mean'})['user_score'].to_dict()
data['user_score'] = data['user_score'].fillna(data['genre'].map(genre_avg))

In your small sample data nothing changes, because none of the NaNs have any other values to average. However, if for instance you change the genre of Wii Sports from Sports to Platform , you will see that Super Mario Bros. will have its user_score filled with the average of the Platform genre games.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM