[英]How to fill the missing values of a column with the mean of a specific class of another column?
I share a part of my big dataframe here to ask my question.我在这里分享我的大数据框的一部分来问我的问题。 In the Age column there are two missing values that are the first two rows.
在Age列中,前两行有两个缺失值。 The way I intend to fill them is based on the following steps:
我打算填充它们的方式基于以下步骤:
I know how to do step 1, I can use data.groupby('Group')['Age'].mean()
but don't know how to proceed to the end of step 4.我知道如何做第 1 步,我可以使用
data.groupby('Group')['Age'].mean()
但不知道如何进行到第 4 步的结尾。
Thanks.谢谢。
Use:用:
df['Age'] = (df['Age'].fillna(df.groupby('Group')['Age'].transform('mean'))
.astype(int))
I'm guessing you're looking for something like this:我猜你正在寻找这样的东西:
df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))
Assuming your data looks like this (I didn't copy the whole dataframe)假设您的数据看起来像这样(我没有复制整个数据框)
Name Age
0 a NaN
1 a NaN
2 b 15.0
3 d 50.0
4 d 45.0
5 a 8.0
6 a 7.0
7 a 8.0
you would run:你会运行:
df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))
and get:并得到:
Name Age
0 a 7.666667 ---> The mean of group 'a'
1 a 7.666667
2 b 15.000000
3 d 50.000000
4 d 45.000000
5 a 8.000000
6 a 7.000000
7 a 8.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.