[英]How to fill NaN based on groupby transform without loosing the column grouped by?
I have a dataset containing heights, weights etc, and I intend to fill the NaN values with the mean value for that gender.我有一个包含身高、体重等的数据集,我打算用该性别的平均值填充 NaN 值。
Example dataset:示例数据集:
gender height weight
1 M 5 NaN
2 F 4 NaN
3 F NaN 40
4 M NaN 50
df = df.groupby("Gender").transform(lambda x: x.fillna(x.mean()))
current output:当前 output:
height weight
1 5 50
2 4 40
3 4 40
4 5 50
Expected output:预计 output:
gender height weight
1 M 5 50
2 F 4 40
3 F 4 40
4 M 5 50
Unfortunately this drops the column Gender which is important later on.不幸的是,这会删除稍后很重要的性别列。
How about looping through the 2 columns you want to fill, and perform GroupBy.transform
, grouping by 'gender':如何遍历要填充的 2 列,然后执行
GroupBy.transform
,按“性别”分组:
for col in ['height','weight']:
df[col] = df.groupby('gender')[col].transform(lambda x: x.fillna(x.mean()))
print(df)
gender height weight
0 M 5.0 50.0
1 F 4.0 40.0
2 F 4.0 40.0
3 M 5.0 50.0
If you want to fill all the numerical columns, you can get them in a list
, and perform the same approach:如果要填充所有数字列,可以将它们放入
list
,并执行相同的方法:
features_to_impute = [
x for x in df.columns if df[x].dtypes != 'O' and df[x].isnull().mean() > 0
]
for col in features_to_impute:
df[col] = df.groupby('gender')[col].transform(lambda x: x.fillna(x.mean()))
Instead of using groupby, you can reach your expected output like below:除了使用 groupby,您还可以达到预期的 output,如下所示:
df = df.groupby('gender').apply(lambda x: x.fillna(x.mean()))
I have a dataset containing heights, weights etc, and I intend to fill the NaN values with the mean value for that gender.我有一个包含身高、体重等的数据集,我打算用该性别的平均值填充 NaN 值。
Example dataset:示例数据集:
gender height weight
1 M 5 NaN
2 F 4 NaN
3 F NaN 40
4 M NaN 50
df = df.groupby("Gender").transform(lambda x: x.fillna(x.mean()))
current output:当前 output:
height weight
1 5 50
2 4 40
3 4 40
4 5 50
Expected output:预期 output:
gender height weight
1 M 5 50
2 F 4 40
3 F 4 40
4 M 5 50
Unfortunately this drops the column Gender which is important later on.不幸的是,这会删除稍后很重要的性别列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.