简体   繁体   English

如何在基于另一列的列中填充缺失值

[英]How to fill an missing values in a column based on another column

I have a dataframe called shoes 我有一个称为鞋的数据框

Brand   Comment
Ugg       NaN
Prada     NaN
Clarks    NaN
Ugg       NaN
Clark     NaN
Prada     Made from horse leather
Prada     Made from pig leather
Prada     NaN
Ugg       Made from Australian cow leather
...

and another dataframe df_mode which was obtained by taking the mode of the comments for each shoe brand in the shoes dataframe for nonnull values 另一个数据框df_mode,该数据框是通过在鞋数据框中获取非空值的每个鞋品牌的注释模式而获得的

Brand  Comment
Ugg    Made from sheep 
Prada  Made from pig leather
Clarks Made from Cow leather

How can I assign the missing values for each shoe brand in the shoes dataframe with its respective mode response shown in the df_mode dataframe. 如何在鞋子数据框中为每个鞋子品牌分配缺失值,并在df_mode数据框中显示其相应的模式响应。

This is basically what I'm trying to achieve 这基本上就是我要实现的目标

Brand   Comment
Ugg       Made from sheep
Prada     Made from pig leather
Clarks    Made from Cow leather
Ugg       Made from sheep
Clark     Made from Cow leather
Prada     Made from horse leather
Prada     Made from pig leather
Prada     Made from pig leather
Ugg       Made from Australian cow leather

使用np.where

shoes['Comment']=np.where(shoes['Comment'].isnull(),shoes['Brand'].map(dict(zip(df_mode['Brand']))),df_mode['Comment'],shoes['Comment'])

使用locmap

shoes.loc[shoes.Comment.isna(), 'Comment'] = shoes.Brand.map(df_mode.set_index('Brand')['Comment'])

you can first groupby by Brand column, then fill the missing values. 您可以groupby品牌”列进行groupby ,然后填写缺失值。 here is the implementation: 这是实现:

df['Comment'] = df.groupby(['Brand'], sort=False)['Comment'].apply(lambda x: x.ffill().bfill())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM