如何用另一列的特定类别的平均值填充列的缺失值？

Question

I share a part of my big dataframe here to ask my question.我在这里分享我的大数据框的一部分来问我的问题。 In the Age column there are two missing values that are the first two rows.在Age列中，前两行有两个缺失值。 The way I intend to fill them is based on the following steps:我打算填充它们的方式基于以下步骤：

Calculte the mean of age for each group.计算每组的平均年龄。 (Assume the mean value of Age in Group A is X ) （假设A组年龄的平均值为X ）
Iterate through Age column to detect the null values (which belong to the first two rows)遍历 Age 列以检测空值（属于前两行）
Return the Group value of each Age null value (which is 'A')返回每个年龄空值的组值（即“A”）
Fill those null values of Age with the mean age value of their corresponding group (The first two rows belong to A then fill their Age null values with X )用对应组的平均年龄值填充 Age 的空值（前两行属于 A 然后用X填充它们的 Age 空值）

I know how to do step 1, I can use data.groupby('Group')['Age'].mean() but don't know how to proceed to the end of step 4.我知道如何做第 1 步，我可以使用data.groupby('Group')['Age'].mean()但不知道如何进行到第 4 步的结尾。

Thanks.谢谢。

Answer 1

Use:用：

df['Age'] = (df['Age'].fillna(df.groupby('Group')['Age'].transform('mean'))
                      .astype(int))

Answer 2

I'm guessing you're looking for something like this:我猜你正在寻找这样的东西：

df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))

Assuming your data looks like this (I didn't copy the whole dataframe)假设您的数据看起来像这样（我没有复制整个数据框）

    Name    Age
0   a   NaN
1   a   NaN
2   b   15.0
3   d   50.0
4   d   45.0
5   a   8.0
6   a   7.0
7   a   8.0

you would run:你会运行：

df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))

and get:并得到：

    Name    Age
0   a   7.666667   ---> The mean of group 'a'
1   a   7.666667
2   b   15.000000
3   d   50.000000
4   d   45.000000
5   a   8.000000
6   a   7.000000
7   a   8.000000

如何用另一列的特定类别的平均值填充列的缺失值？

问题描述

2 个解决方案

解决方案1
2 2020-03-01 20:54:46

解决方案2
1 2020-03-01 20:50:56

如何用另一列的特定类别的平均值填充列的缺失值？

问题描述

2 个解决方案

解决方案1 2 2020-03-01 20:54:46

解决方案2 1 2020-03-01 20:50:56

解决方案1
2 2020-03-01 20:54:46

解决方案2
1 2020-03-01 20:50:56