如何根据另一列中的值用另一列的平均值替换 NaN 值？ Pandas

Question

I have got a dataframe of game releases and ratings我有一个 dataframe 的游戏版本和评级

name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8.0,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,

I want to fill NaN values in user_score column with the mean of the same genre.我想用相同类型的平均值填充 user_score 列中的 NaN 值。 If a game has sports genre and in that row user_score is NaN i want replace the null value with sport's average user rating.如果游戏有体育类型并且在那一行 user_score 是 NaN 我想用体育的平均用户评分替换 null 值。

Answer 1

This data has had the user_score of the second sports game removed so that we can demonstrate the code.此数据已删除第二场体育比赛的 user_score，以便我们演示代码。

name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,

Looking at the user score of the Wii Sports Resort看Wii Sports Resort的用户评分

df.iloc[3]['user_score']

nan

Replacing NaN with the mean of the user_score by genre用流派的 user_score 的平均值替换 NaN

df['user_score'] = df.groupby('genre')['user_score'].transform(lambda x: x.fillna(x.mean()))

Checking the output of the same game after the update更新后查看同游戏的output

df.iloc[3]['user_score']

8.0

Answer 2

One possible solution is to create a dictionary genre_avg of genre average ratings and then substitute NAs in user_score according to this dictionary一种可能的解决方案是创建一个类型平均评分的字典genre_avg ，然后根据该字典替换user_score中的 NA

genre_avg = data.groupby(['genre']).agg({'user_score': 'mean'})['user_score'].to_dict()
data['user_score'] = data['user_score'].fillna(data['genre'].map(genre_avg))

In your small sample data nothing changes, because none of the NaNs have any other values to average.在您的小样本数据中，没有任何变化，因为没有一个NaNs有任何其他值要平均。 However, if for instance you change the genre of Wii Sports from Sports to Platform , you will see that Super Mario Bros. will have its user_score filled with the average of the Platform genre games.但是，例如，如果您将Wii Sports的genre从Sports更改为Platform ，您会看到Super Mario Bros.的user_score将充满Platform类型游戏的平均值。

如何根据另一列中的值用另一列的平均值替换 NaN 值？ Pandas

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-07-07 12:46:35

解决方案2
1 2020-07-07 12:53:19

如何根据另一列中的值用另一列的平均值替换 NaN 值？ Pandas

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-07-07 12:46:35

解决方案2 1 2020-07-07 12:53:19

解决方案1
1 已采纳 2020-07-07 12:46:35

解决方案2
1 2020-07-07 12:53:19