简体   繁体   English

如何在不丢失分组依据的列的情况下基于 groupby 变换填充 NaN?

[英]How to fill NaN based on groupby transform without loosing the column grouped by?

I have a dataset containing heights, weights etc, and I intend to fill the NaN values with the mean value for that gender.我有一个包含身高、体重等的数据集,我打算用该性别的平均值填充 NaN 值。

Example dataset:示例数据集:

    gender    height    weight
1     M          5       NaN
2     F          4       NaN
3     F         NaN        40
4     M         NaN        50
df = df.groupby("Gender").transform(lambda x: x.fillna(x.mean()))

current output:当前 output:

     height    weight
1       5        50
2       4        40
3       4        40
4       5        50

Expected output:预计 output:

    gender    height    weight
1     M          5        50
2     F          4        40
3     F          4        40
4     M          5        50

Unfortunately this drops the column Gender which is important later on.不幸的是,这会删除稍后很重要的性别列。

How about looping through the 2 columns you want to fill, and perform GroupBy.transform , grouping by 'gender':如何遍历要填充的 2 列,然后执行GroupBy.transform ,按“性别”分组:

for col in ['height','weight']:
    df[col] = df.groupby('gender')[col].transform(lambda x: x.fillna(x.mean()))

print(df)

  gender  height  weight
0      M     5.0    50.0
1      F     4.0    40.0
2      F     4.0    40.0
3      M     5.0    50.0

If you want to fill all the numerical columns, you can get them in a list , and perform the same approach:如果要填充所有数字列,可以将它们放入list ,并执行相同的方法:

features_to_impute = [
        x for x in df.columns if df[x].dtypes != 'O' and df[x].isnull().mean() > 0
        ]

for col in features_to_impute:
    df[col] = df.groupby('gender')[col].transform(lambda x: x.fillna(x.mean()))

Instead of using groupby, you can reach your expected output like below:除了使用 groupby,您还可以达到预期的 output,如下所示:

 df = df.groupby('gender').apply(lambda x: x.fillna(x.mean()))

I have a dataset containing heights, weights etc, and I intend to fill the NaN values with the mean value for that gender.我有一个包含身高、体重等的数据集,我打算用该性别的平均值填充 NaN 值。

Example dataset:示例数据集:

    gender    height    weight
1     M          5       NaN
2     F          4       NaN
3     F         NaN        40
4     M         NaN        50
df = df.groupby("Gender").transform(lambda x: x.fillna(x.mean()))

current output:当前 output:

     height    weight
1       5        50
2       4        40
3       4        40
4       5        50

Expected output:预期 output:

    gender    height    weight
1     M          5        50
2     F          4        40
3     F          4        40
4     M          5        50

Unfortunately this drops the column Gender which is important later on.不幸的是,这会删除稍后很重要的性别列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM