[英]Conditional NaN filling not changing column or making all None
I have a df with a column, Critic_Score, that has NaN values.我有一个 df,其中有一列 Critic_Score,它具有 NaN 值。 I am trying to replace them with the average of the Critic Scores from the same platform.我试图用来自同一平台的评论家分数的平均值替换它们。 This question has been asked on stack overflow several times and I used 4 suggestions that did not give me the desired output. Please tell me how to fix this.这个问题已经在堆栈溢出上被问过好几次了,我使用了 4 个建议,但没有给我所需的 output。请告诉我如何解决这个问题。
This is a subset of the df:这是 df 的一个子集:
x[['Platform','Critic_Score']].head()
Platform Critic_Score
0 wii 76.0
1 nes NaN
2 wii 82.0
3 wii 80.0
4 gb NaN
More information on the original df:有关原始 df 的更多信息:
x.head().to_dict('list')
{'Name': ['wii sports',
'super mario bros.',
'mario kart wii',
'wii sports resort',
'pokemon red/pokemon blue'],
'Platform': ['wii', 'nes', 'wii', 'wii', 'gb'],
'Year_of_Release': [2006.0, 1985.0, 2008.0, 2009.0, 1996.0],
'Genre': ['sports', 'platform', 'racing', 'sports', 'role-playing'],
'NA_sales': [41.36, 29.08, 15.68, 15.61, 11.27],
'EU_sales': [28.96, 3.58, 12.76, 10.93, 8.89],
'JP_sales': [3.77, 6.81, 3.79, 3.28, 10.22],
'Other_sales': [8.45, 0.77, 3.29, 2.95, 1.0],
'Critic_Score': [76.0, nan, 82.0, 80.0, nan],
'User_Score': ['8', nan, '8.3', '8', nan],
'Rating': ['E', nan, 'E', 'E', nan]}
These are the statements I tried followed by their output:这些是我在其 output 之后尝试的声明:
1. 1.
x['Critic_Score'] = x['Critic_Score'].fillna(x.groupby('Platform')['Critic_Score'].transform('mean'), inplace = True)
0 None
1 None
2 None
3 None
4 None
Name: Critic_Score, dtype: object
x.loc[x.Critic_Score.isnull(), 'Critic_Score'] = x.groupby('Platform').Critic_Score.transform('mean')
#no change in column
0 76.0
1 NaN
2 82.0
3 80.0
4 NaN
x['Critic_Score'] = x.groupby('Platform')['Critic_Score']\
.transform(lambda y: y.fillna(y.mean()))
#no change in column
0 76.0
1 NaN
2 82.0
3 80.0
4 NaN
Name: Critic_Score, dtype: float64
x['Critic_Score']=x.groupby('Platform')['Critic_Score'].apply(lambda y:y.fillna(y.mean()))
x['Critic_Score'].head()
Out[73]:
0 76.0
1 NaN
2 82.0
3 80.0
4 NaN
Name: Critic_Score, dtype: float64
x.update(
x.groupby('Platform').Critic_Score.transform('mean'),
overwrite=False)
First you create a new df with the same number of rows but with the platform average on every row.首先,您创建一个新的 df,它具有相同的行数,但每行的平台平均值。
Then use that to update the original然后用它来更新原来的
Bear in mind your sample has only one row of nes
and another of gb
, both with nan
score, so there is nothing to be averaged请记住,您的样本只有一行nes
和另一行gb
,两者都有nan
分数,所以没有什么可以平均的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.