[英]How to replace NaN values with another column's mean based on value in another column? Pandas
I have got a dataframe of game releases and ratings我有一个 dataframe 的游戏版本和评级
name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8.0,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,
I want to fill NaN values in user_score column with the mean of the same genre.我想用相同类型的平均值填充 user_score 列中的 NaN 值。 If a game has sports genre and in that row user_score is NaN i want replace the null value with sport's average user rating.如果游戏有体育类型并且在那一行 user_score 是 NaN 我想用体育的平均用户评分替换 null 值。
This data has had the user_score of the second sports game removed so that we can demonstrate the code.此数据已删除第二场体育比赛的 user_score,以便我们演示代码。
name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,
Looking at the user score of the Wii Sports Resort看Wii Sports Resort的用户评分
df.iloc[3]['user_score']
nan
Replacing NaN with the mean of the user_score by genre用流派的 user_score 的平均值替换 NaN
df['user_score'] = df.groupby('genre')['user_score'].transform(lambda x: x.fillna(x.mean()))
Checking the output of the same game after the update更新后查看同游戏的output
df.iloc[3]['user_score']
8.0
One possible solution is to create a dictionary genre_avg
of genre average ratings and then substitute NAs in user_score
according to this dictionary一种可能的解决方案是创建一个类型平均评分的字典genre_avg
,然后根据该字典替换user_score
中的 NA
genre_avg = data.groupby(['genre']).agg({'user_score': 'mean'})['user_score'].to_dict()
data['user_score'] = data['user_score'].fillna(data['genre'].map(genre_avg))
In your small sample data nothing changes, because none of the NaNs
have any other values to average.在您的小样本数据中,没有任何变化,因为没有一个NaNs
有任何其他值要平均。 However, if for instance you change the genre
of Wii Sports
from Sports
to Platform
, you will see that Super Mario Bros.
will have its user_score
filled with the average of the Platform
genre games.但是,例如,如果您将Wii Sports
的genre
从Sports
更改为Platform
,您会看到Super Mario Bros.
的user_score
将充满Platform
类型游戏的平均值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.