简体   繁体   English

在dataframe groupby对象上填充未填充所有nan数据

[英]ffill on dataframe groupby object not filling all nan data

I have a Dataframe with NaN s. 我有一个NaN的数据框。 Below is what it looks like when sorted with outdf.sort_values(['id','eff_date']) . 下面是使用outdf.sort_values(['id','eff_date'])排序时的outdf.sort_values(['id','eff_date'])

         id       color_set      shape_set  eff_date  type
527      35  MONO COLOR SET  REC SHAPE SET  20190318   Add
35       53  MONO COLOR SET  TRI SHAPE SET  20150320   Add
102      53  MONO COLOR SET  REC SHAPE SET  20150521   Add
103      53  MONO COLOR SET  TRI SHAPE SET  20150521  Drop
368      53  MONO COLOR SET  REC SHAPE SET  20170320   Add
56       61  MONO COLOR SET  TRI SHAPE SET  20150320   Add
104      61  MONO COLOR SET  REC SHAPE SET  20150521   Add
105      61  MONO COLOR SET  TRI SHAPE SET  20150521  Drop
388      61             NaN            NaN  20170320   Add
486      61             NaN            NaN  20180319   Add
576      61             NaN            NaN  20190318   Add
556      67  MONO COLOR SET  REC SHAPE SET  20190318   Add
78       72  MONO COLOR SET  TRI SHAPE SET  20150320   Add
106      72  MONO COLOR SET  REC SHAPE SET  20150521   Add
107      72  MONO COLOR SET  TRI SHAPE SET  20150521  Drop
391      72             NaN            NaN  20170320   Add
496      72             NaN            NaN  20180319   Add
592      72             NaN            NaN  20190318   Add

I'm trying to ffill only on matching id and type with the following code: 我正在尝试仅使用以下代码ffill匹配的idtype

outdf[['id','color_set','shape_set']] = outdf.groupby(['id','type'])[['color_set','shape_set']].ffill()

However this code seems to not be matching type . 但是,此代码似乎与type不匹配。 As below code for id 61 index 388 ffiled from index 105 instead of index 104 . 如下所示, id 61代码从index 105而非index 104 index 388 This code is also only working for some of the NaN s as it missed id 72 completely. 该代码也仅适用于某些NaN因为它完全错过了id 72 Below is result of above code I have tried. 以下是我尝试过的上述代码的结果。

         id       color_set      shape_set  eff_date  type
527      35  MONO COLOR SET  REC SHAPE SET  20190318   Add
35       53  MONO COLOR SET  TRI SHAPE SET  20150320   Add
102      53  MONO COLOR SET  REC SHAPE SET  20150521   Add
103      53  MONO COLOR SET  TRI SHAPE SET  20150521  Drop
368      53  MONO COLOR SET  REC SHAPE SET  20170320   Add
56       61  MONO COLOR SET  TRI SHAPE SET  20150320   Add
104      61  MONO COLOR SET  REC SHAPE SET  20150521   Add
105      61  MONO COLOR SET  TRI SHAPE SET  20150521  Drop
388      61  MONO COLOR SET  TRI SHAPE SET  20170320   Add
486      61  MONO COLOR SET  TRI SHAPE SET  20180319   Add
576      61  MONO COLOR SET  TRI SHAPE SET  20190318   Add
556      67  MONO COLOR SET  REC SHAPE SET  20190318   Add
78       72  MONO COLOR SET  TRI SHAPE SET  20150320   Add
106      72  MONO COLOR SET  REC SHAPE SET  20150521   Add
107      72  MONO COLOR SET  TRI SHAPE SET  20150521  Drop
391      72             NaN            NaN  20170320   Add
496      72             NaN            NaN  20180319   Add
592      72             NaN            NaN  20190318   Add

Any help on how to fill these NaN s by matching id and type is greatly appreciated. 非常感谢您提供有关如何通过匹配idtype来填充这些NaN的任何帮助。 Note: if the first occurrence of id is NaN I would like to keep it as NaN as I will need to look up the value from a different data set. 注意:如果id的第一个出现是NaN我想将其保留为NaN因为我需要从其他数据集中查找值。

我要做的是:

outdf.groupby("id").ffill().bfill() 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM