![](/img/trans.png)
[英]Pandas: How to replace values of Nan in column based on another column?
[英]How to replace NaN values with another column's mean based on value in another column? Pandas
我有一個 dataframe 的游戲版本和評級
name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8.0,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,
我想用相同類型的平均值填充 user_score 列中的 NaN 值。 如果游戲有體育類型並且在那一行 user_score 是 NaN 我想用體育的平均用戶評分替換 null 值。
此數據已刪除第二場體育比賽的 user_score,以便我們演示代碼。
name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,
看Wii Sports Resort的用戶評分
df.iloc[3]['user_score']
nan
用流派的 user_score 的平均值替換 NaN
df['user_score'] = df.groupby('genre')['user_score'].transform(lambda x: x.fillna(x.mean()))
更新后查看同游戲的output
df.iloc[3]['user_score']
8.0
一種可能的解決方案是創建一個類型平均評分的字典genre_avg
,然后根據該字典替換user_score
中的 NA
genre_avg = data.groupby(['genre']).agg({'user_score': 'mean'})['user_score'].to_dict()
data['user_score'] = data['user_score'].fillna(data['genre'].map(genre_avg))
在您的小樣本數據中,沒有任何變化,因為沒有一個NaNs
有任何其他值要平均。 但是,例如,如果您將Wii Sports
的genre
從Sports
更改為Platform
,您會看到Super Mario Bros.
的user_score
將充滿Platform
類型游戲的平均值。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.