简体   繁体   English

在熊猫中按组填写缺少的日期

[英]Fill missing dates by group in pandas

I need to fill the missing date down by group. 我需要按组填写缺少的日期。 Here is the code to create the data frame. 这是创建数据框的代码。 i want to add the date of the fill column down only as far as the when the date of the fill column changes and only until the group 'name' changes. 我只想将填充列的日期向下添加到填充列的日期更改时,直到组“名称”更改为止。

    data = {'tdate': [20080815,20080915,20081226,20090110,20090131,20080807,20080831,
    20080918,20081023,20081114,20081207,20090117,20090203,20090219,20090305,20090318,20090501],
        'name': ['A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
    'fill': [NaN,NaN,20080915,NaN,NaN,NaN,NaN,NaN,NaN,20081023,
             NaN,NaN,NaN,NaN,20090219,NaN,NaN]}

    df = pd.DataFrame(data, columns=['tdate', 'name', 'fill'])
    df

Current data frame 当前数据帧

tdate   name    fill
0    20080815    A   NaN
1    20080915    A   NaN
2    20081226    A   20080915
3    20090110    A   NaN
4    20090131    A   NaN
5    20080807    B   NaN
6    20080831    B   NaN
7    20080918    B   NaN
8    20081023    B   NaN
9    20081114    B   20081023
10   20081207    B   NaN
11   20090117    B   NaN
12   20090203    B   NaN
13   20090219    B   NaN
14   20090305    B   20090219
15   20090318    B   NaN
16   20090501    B   NaN

Desired output 所需的输出

    tdate   name    fill
0    20080815    A   NaN
1    20080915    A   NaN
2    20081226    A   20080915
3    20090110    A   20080915
4    20090131    A   20080915
5    20080807    B   NaN
6    20080831    B   NaN
7    20080918    B   NaN
8    20081023    B   NaN
9    20081114    B   NaN
10   20081207    B   20081023
11   20090117    B   20081023
12   20090203    B   20081023
13   20090219    B   20081023
14   20090305    B   20081023
15   20090318    B   20090219
16   20090501    B   20090219

Here is my code 这是我的代码

df.groupby(df["name"])["fill"].fill()

You were pretty close, you just need to forward -fill rather than just filling: 您已经很接近了,您只需要转发 -fill而不是仅填充:

df.groupby('name')["fill"].ffill()
Out[42]: 
0          NaN
1          NaN
2     20080915
3     20080915
4     20080915
5          NaN
6          NaN
7          NaN
8          NaN
9     20081023
10    20081023
11    20081023
12    20081023
13    20081023
14    20090219
15    20090219
16    20090219
dtype: float64

Or equivalently: 或等效地:

df.groupby('name')["fill"].fillna(method='ffill')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM