I share a part of my big dataframe here to ask my question. In the Age column there are two missing values that are the first two rows. The way I intend to fill them is based on the following steps:
I know how to do step 1, I can use data.groupby('Group')['Age'].mean()
but don't know how to proceed to the end of step 4.
Thanks.
Use:
df['Age'] = (df['Age'].fillna(df.groupby('Group')['Age'].transform('mean'))
.astype(int))
I'm guessing you're looking for something like this:
df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))
Assuming your data looks like this (I didn't copy the whole dataframe)
Name Age
0 a NaN
1 a NaN
2 b 15.0
3 d 50.0
4 d 45.0
5 a 8.0
6 a 7.0
7 a 8.0
you would run:
df['Age'] = df.groupby(['Name'])['Age'].transform(lambda x: np.where(np.isnan(x), x.mean(),x))
and get:
Name Age
0 a 7.666667 ---> The mean of group 'a'
1 a 7.666667
2 b 15.000000
3 d 50.000000
4 d 45.000000
5 a 8.000000
6 a 7.000000
7 a 8.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.