简体   繁体   English

有条件的 NaN 填充不更改列或全部为 None

[英]Conditional NaN filling not changing column or making all None

I have a df with a column, Critic_Score, that has NaN values.我有一个 df,其中有一列 Critic_Score,它具有 NaN 值。 I am trying to replace them with the average of the Critic Scores from the same platform.我试图用来自同一平台的评论家分数的平均值替换它们。 This question has been asked on stack overflow several times and I used 4 suggestions that did not give me the desired output. Please tell me how to fix this.这个问题已经在堆栈溢出上被问过好几次了,我使用了 4 个建议,但没有给我所需的 output。请告诉我如何解决这个问题。

This is a subset of the df:这是 df 的一个子集:

x[['Platform','Critic_Score']].head()

Platform    Critic_Score
0   wii 76.0
1   nes NaN
2   wii 82.0
3   wii 80.0
4   gb  NaN

More information on the original df:有关原始 df 的更多信息:

x.head().to_dict('list')
{'Name': ['wii sports',
  'super mario bros.',
  'mario kart wii',
  'wii sports resort',
  'pokemon red/pokemon blue'],
 'Platform': ['wii', 'nes', 'wii', 'wii', 'gb'],
 'Year_of_Release': [2006.0, 1985.0, 2008.0, 2009.0, 1996.0],
 'Genre': ['sports', 'platform', 'racing', 'sports', 'role-playing'],
 'NA_sales': [41.36, 29.08, 15.68, 15.61, 11.27],
 'EU_sales': [28.96, 3.58, 12.76, 10.93, 8.89],
 'JP_sales': [3.77, 6.81, 3.79, 3.28, 10.22],
 'Other_sales': [8.45, 0.77, 3.29, 2.95, 1.0],
 'Critic_Score': [76.0, nan, 82.0, 80.0, nan],
 'User_Score': ['8', nan, '8.3', '8', nan],
 'Rating': ['E', nan, 'E', 'E', nan]}

These are the statements I tried followed by their output:这些是我在其 output 之后尝试的声明:

1. 1.

x['Critic_Score'] = x['Critic_Score'].fillna(x.groupby('Platform')['Critic_Score'].transform('mean'), inplace = True)

0    None
1    None
2    None
3    None
4    None
Name: Critic_Score, dtype: object
x.loc[x.Critic_Score.isnull(), 'Critic_Score'] = x.groupby('Platform').Critic_Score.transform('mean')
#no change in column
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
x['Critic_Score'] = x.groupby('Platform')['Critic_Score']\
    .transform(lambda y: y.fillna(y.mean()))
#no change in column
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
Name: Critic_Score, dtype: float64
x['Critic_Score']=x.groupby('Platform')['Critic_Score'].apply(lambda y:y.fillna(y.mean()))
​
x['Critic_Score'].head()
​

Out[73]:
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
Name: Critic_Score, dtype: float64
x.update(
    x.groupby('Platform').Critic_Score.transform('mean'),
    overwrite=False)
  • First you create a new df with the same number of rows but with the platform average on every row.首先,您创建一个新的 df,它具有相同的行数,但每行的平台平均值。

  • Then use that to update the original然后用它来更新原来的

Bear in mind your sample has only one row of nes and another of gb , both with nan score, so there is nothing to be averaged请记住,您的样本只有一行nes和另一行gb ,两者都有nan分数,所以没有什么可以平均的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM