Pandas - fillna with mean for specific categories

Question

I'd like to fillna with the mean number for the column but only for representatives of the same category as the missing value

data = {'Class': ['Superlight', 'Aero', 'Aero', 'Superlight', 'Superlight', 'Superlight', 'Aero', 'Aero'],
        'Weight': [5.6, 8.6, np.nan, 5.9, 5.65, np.nan, 8.1, 8.4]}


    Class   Weight
0   Superlight     5.60
1   Aero           8.60
2   Aero           NaN
3   Superlight     5.90
4   Superlight     5.65
5   Superlight     NaN
6   Aero           8.10
7   Aero           8.40

I know I can do:

df.Weight.fillna(df.Weight.mean())

But that will fill in the missing values with the mean of the whole column.

The following would replace the null values with the mean for the AERO category (which is better but still no good as I'd have to do it for each category/class separately)

df.Weight.fillna(df[df.Class == 'Aero'].Weight.mean())

Is it possible to abstract it so that it'll automatically take the Class of the current row and find the mean of the values falling into that category and replace it without hardcoding the Class values? Hope that makes sense.

Answer 1

groupby + transform and then fillna:

df['Weight'].fillna(df.groupby("Class")['Weight'].transform("mean"))

0    5.600000
1    8.600000
2    8.366667
3    5.900000
4    5.650000
5    5.716667
6    8.100000
7    8.400000
Name: Weight, dtype: float64

Answer 2

也许您可以尝试使用groupby并apply每个组：

df.groupby('Class')['Weight'].apply(lambda g: g.fillna(g.mean()))

Pandas - fillna with mean for specific categories

Question

2 answers

solution1
6 ACCPTED 2020-09-24 16:07:53

solution2
2 2020-09-24 16:11:09

Pandas - fillna with mean for specific categories

Question

2 answers

solution1 6 ACCPTED 2020-09-24 16:07:53

solution2 2 2020-09-24 16:11:09

solution1
6 ACCPTED 2020-09-24 16:07:53

solution2
2 2020-09-24 16:11:09