How to fill the missing values of a column with the mean of a specific class of another column?

Question

I share a part of my big dataframe here to ask my question. In the Age column there are two missing values that are the first two rows. The way I intend to fill them is based on the following steps:

Calculte the mean of age for each group. (Assume the mean value of Age in Group A is X )
Iterate through Age column to detect the null values (which belong to the first two rows)
Return the Group value of each Age null value (which is 'A')
Fill those null values of Age with the mean age value of their corresponding group (The first two rows belong to A then fill their Age null values with X )

I know how to do step 1, I can use data.groupby('Group')['Age'].mean() but don't know how to proceed to the end of step 4.

Thanks.

Answer 1

Use:

df['Age'] = (df['Age'].fillna(df.groupby('Group')['Age'].transform('mean'))
                      .astype(int))

Answer 2

I'm guessing you're looking for something like this:

df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))

Assuming your data looks like this (I didn't copy the whole dataframe)

    Name    Age
0   a   NaN
1   a   NaN
2   b   15.0
3   d   50.0
4   d   45.0
5   a   8.0
6   a   7.0
7   a   8.0

you would run:

df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))

and get:

    Name    Age
0   a   7.666667   ---> The mean of group 'a'
1   a   7.666667
2   b   15.000000
3   d   50.000000
4   d   45.000000
5   a   8.000000
6   a   7.000000
7   a   8.000000

How to fill the missing values of a column with the mean of a specific class of another column?

Question

2 answers

solution1
2 2020-03-01 20:54:46

solution2
1 2020-03-01 20:50:56

How to fill the missing values of a column with the mean of a specific class of another column?

Question

2 answers

solution1 2 2020-03-01 20:54:46

solution2 1 2020-03-01 20:50:56

solution1
2 2020-03-01 20:54:46

solution2
1 2020-03-01 20:50:56