简体   繁体   English

如何根据另一列中的值用另一列的平均值替换 NaN 值? Pandas

[英]How to replace NaN values with another column's mean based on value in another column? Pandas

I have got a dataframe of game releases and ratings我有一个 dataframe 的游戏版本和评级

name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8.0,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,

I want to fill NaN values in user_score column with the mean of the same genre.我想用相同类型的平均值填充 user_score 列中的 NaN 值。 If a game has sports genre and in that row user_score is NaN i want replace the null value with sport's average user rating.如果游戏有体育类型并且在那一行 user_score 是 NaN 我想用体育的平均用户评分替换 null 值。

This data has had the user_score of the second sports game removed so that we can demonstrate the code.此数据已删除第二场体育比赛的 user_score,以便我们演示代码。

name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,

Looking at the user score of the Wii Sports Resort看Wii Sports Resort的用户评分

df.iloc[3]['user_score']

nan

Replacing NaN with the mean of the user_score by genre用流派的 user_score 的平均值替换 NaN

df['user_score'] = df.groupby('genre')['user_score'].transform(lambda x: x.fillna(x.mean()))

Checking the output of the same game after the update更新后查看同游戏的output

df.iloc[3]['user_score']

8.0

One possible solution is to create a dictionary genre_avg of genre average ratings and then substitute NAs in user_score according to this dictionary一种可能的解决方案是创建一个类型平均评分的字典genre_avg ,然后根据该字典替换user_score中的 NA

genre_avg = data.groupby(['genre']).agg({'user_score': 'mean'})['user_score'].to_dict()
data['user_score'] = data['user_score'].fillna(data['genre'].map(genre_avg))

In your small sample data nothing changes, because none of the NaNs have any other values to average.在您的小样本数据中,没有任何变化,因为没有一个NaNs有任何其他值要平均。 However, if for instance you change the genre of Wii Sports from Sports to Platform , you will see that Super Mario Bros. will have its user_score filled with the average of the Platform genre games.但是,例如,如果您将Wii SportsgenreSports更改为Platform ,您会看到Super Mario Bros.user_score将充满Platform类型游戏的平均值。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:如何根据另一列替换列中的 Nan 值? - Pandas: How to replace values of Nan in column based on another column? 根据另一列中的值,用字符串替换一列中的NaN - Replace NaN's in one column with string, based on value in another column 根据另一列特定值用平均值填充 NaN 值 - fill NaN values with mean based on another column specific value 如何使用 Pandas 根据同一行中另一列的值替换一列中的 NaN 值? - How to replace NaN value in one column based on the value of another column in the same row using Pandas? 如何更换<NA>基于熊猫中另一列主值的值 - How to replace <NA> values based on another column main value in pandas 根据另一列的值替换Pandas数据框的Column的值 - Replace values of a Pandas dataframe's Column based on values of another column 如果另一列是NaN,如何替换列中的值? - How to replace values in a column if another column is a NaN? 如何基于另一列的NaN值设置熊猫数据框中的值? - How set values in pandas dataframe based on NaN values of another column? 如何根据同一 dataframe 中另一列的值替换 Dataframe 中列中的 NaN 值 - How to replace NaN value in column in Dataframe based on values from another column in same dataframe 使用 Python Pandas,仅当“nan”值不存在时,我可以根据另一列替换 df 中一列的值吗? - Using Python Pandas, can I replace values of one column in a df based on another column only when a "nan" value does not exist?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM