[英]ffill on dataframe groupby object not filling all nan data
I have a Dataframe with NaN
s. 我有一个
NaN
的数据框。 Below is what it looks like when sorted with outdf.sort_values(['id','eff_date'])
. 下面是使用
outdf.sort_values(['id','eff_date'])
排序时的outdf.sort_values(['id','eff_date'])
。
id color_set shape_set eff_date type
527 35 MONO COLOR SET REC SHAPE SET 20190318 Add
35 53 MONO COLOR SET TRI SHAPE SET 20150320 Add
102 53 MONO COLOR SET REC SHAPE SET 20150521 Add
103 53 MONO COLOR SET TRI SHAPE SET 20150521 Drop
368 53 MONO COLOR SET REC SHAPE SET 20170320 Add
56 61 MONO COLOR SET TRI SHAPE SET 20150320 Add
104 61 MONO COLOR SET REC SHAPE SET 20150521 Add
105 61 MONO COLOR SET TRI SHAPE SET 20150521 Drop
388 61 NaN NaN 20170320 Add
486 61 NaN NaN 20180319 Add
576 61 NaN NaN 20190318 Add
556 67 MONO COLOR SET REC SHAPE SET 20190318 Add
78 72 MONO COLOR SET TRI SHAPE SET 20150320 Add
106 72 MONO COLOR SET REC SHAPE SET 20150521 Add
107 72 MONO COLOR SET TRI SHAPE SET 20150521 Drop
391 72 NaN NaN 20170320 Add
496 72 NaN NaN 20180319 Add
592 72 NaN NaN 20190318 Add
I'm trying to ffill
only on matching id
and type
with the following code: 我正在尝试仅使用以下代码
ffill
匹配的id
和type
:
outdf[['id','color_set','shape_set']] = outdf.groupby(['id','type'])[['color_set','shape_set']].ffill()
However this code seems to not be matching type
. 但是,此代码似乎与
type
不匹配。 As below code for id 61
index 388
ffiled from index 105
instead of index 104
. 如下所示,
id 61
代码从index 105
而非index 104
index 388
。 This code is also only working for some of the NaN
s as it missed id 72
completely. 该代码也仅适用于某些
NaN
因为它完全错过了id 72
。 Below is result of above code I have tried. 以下是我尝试过的上述代码的结果。
id color_set shape_set eff_date type
527 35 MONO COLOR SET REC SHAPE SET 20190318 Add
35 53 MONO COLOR SET TRI SHAPE SET 20150320 Add
102 53 MONO COLOR SET REC SHAPE SET 20150521 Add
103 53 MONO COLOR SET TRI SHAPE SET 20150521 Drop
368 53 MONO COLOR SET REC SHAPE SET 20170320 Add
56 61 MONO COLOR SET TRI SHAPE SET 20150320 Add
104 61 MONO COLOR SET REC SHAPE SET 20150521 Add
105 61 MONO COLOR SET TRI SHAPE SET 20150521 Drop
388 61 MONO COLOR SET TRI SHAPE SET 20170320 Add
486 61 MONO COLOR SET TRI SHAPE SET 20180319 Add
576 61 MONO COLOR SET TRI SHAPE SET 20190318 Add
556 67 MONO COLOR SET REC SHAPE SET 20190318 Add
78 72 MONO COLOR SET TRI SHAPE SET 20150320 Add
106 72 MONO COLOR SET REC SHAPE SET 20150521 Add
107 72 MONO COLOR SET TRI SHAPE SET 20150521 Drop
391 72 NaN NaN 20170320 Add
496 72 NaN NaN 20180319 Add
592 72 NaN NaN 20190318 Add
Any help on how to fill these NaN
s by matching id
and type
is greatly appreciated. 非常感谢您提供有关如何通过匹配
id
和type
来填充这些NaN
的任何帮助。 Note: if the first occurrence of id
is NaN
I would like to keep it as NaN
as I will need to look up the value from a different data set. 注意:如果
id
的第一个出现是NaN
我想将其保留为NaN
因为我需要从其他数据集中查找值。
我要做的是:
outdf.groupby("id").ffill().bfill()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.