简体   繁体   中英

How to fill the missing values of a column with the mean of a specific class of another column?

I share a part of my big dataframe here to ask my question. In the Age column there are two missing values that are the first two rows. The way I intend to fill them is based on the following steps:

  1. Calculte the mean of age for each group. (Assume the mean value of Age in Group A is X )
  2. Iterate through Age column to detect the null values (which belong to the first two rows)
  3. Return the Group value of each Age null value (which is 'A')
  4. Fill those null values of Age with the mean age value of their corresponding group (The first two rows belong to A then fill their Age null values with X )

I know how to do step 1, I can use data.groupby('Group')['Age'].mean() but don't know how to proceed to the end of step 4.

Thanks.

在此处输入图片说明

Use:

df['Age'] = (df['Age'].fillna(df.groupby('Group')['Age'].transform('mean'))
                      .astype(int))

I'm guessing you're looking for something like this:

df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))

Assuming your data looks like this (I didn't copy the whole dataframe)

    Name    Age
0   a   NaN
1   a   NaN
2   b   15.0
3   d   50.0
4   d   45.0
5   a   8.0
6   a   7.0
7   a   8.0

you would run:

df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))

and get:

    Name    Age
0   a   7.666667   ---> The mean of group 'a'
1   a   7.666667
2   b   15.000000
3   d   50.000000
4   d   45.000000
5   a   8.000000
6   a   7.000000
7   a   8.000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM