简体   繁体   English

PANDAS(在 pandas 的 dataframe 中填充 datetime 和 ffill() 数据)

[英]PANDAS (poputating datetime and ffill() the data in dataframe in pandas)

I have a dataframe as below, it only have business days data(ie excluding weekends)我有一个 dataframe 如下,它只有工作日数据(即不包括周末)

effective_date,ent_id 
2020-02-03,349114.0
2020-02-03,1559910.0
2020-02-03,23431626.0
2020-02-03,15747736.0
2020-02-04,21349114.0
2020-02-04,15559910.0
2020-02-04,323431626.0
2020-02-04,5747736.0
2020-02-05,76349114.0
2020-02-05,5459910.0
2020-02-05,89431626.0 
2020-02-05,37747736.0 
2020-02-06,10349114.0
2020-02-06,26559910.0
2020-02-06,35431626.0
2020-02-06,88747736.0
2020-02-07,34913414.0 
2020-02-07,15591910.0 
2020-02-07,234318626.0
2020-02-07,1574436.0
2020-02-10,139114.0 
2020-02-10,4359910.0
2020-02-10,43431626.0
2020-02-10,10947736.0

i have to perform two action on this dataframe 1) is populate the missing timeseries ie 2020-02-08 and 2020-02-09 2) is to forward fill the entire block of data from preceding date我必须对此 dataframe 执行两个操作 1) 填充缺失的时间序列,即 2020-02-08 和 2020-02-09 2) 是从前一个日期向前填充整个数据块

so my output dataframe should be like this所以我的 output dataframe 应该是这样的

effective_date,ent_id
> 2020-02-03,349114.0 2020-02-03,1559910.0 2020-02-03,23431626.0
> 2020-02-03,15747736.0 2020-02-04,21349114.0 2020-02-04,15559910.0
> 2020-02-04,323431626.0 2020-02-04,5747736.0 2020-02-05,76349114.0
> 2020-02-05,5459910.0 2020-02-05,89431626.0 2020-02-05,37747736.0
> 2020-02-06,10349114.0 2020-02-06,26559910.0 2020-02-06,35431626.0
> 2020-02-06,88747736.0 2020-02-07,34913414.0 2020-02-07,15591910.0
> 2020-02-07,234318626.0 2020-02-07,1574436.0 2020-02-08,34913414.0
> 2020-02-08,15591910.0 2020-02-08,234318626.0 2020-02-08,1574436.0
> 2020-02-09,34913414.0 2020-02-09,15591910.0 2020-02-09,234318626.0
> 2020-02-09,1574436.0 2020-02-10,139114.0 2020-02-10,4359910.0
> 2020-02-10,43431626.0 2020-02-10,10947736.0

Use GroupBy.cumcount for counter, which is used fo MultiIndex by DataFrame.set_index , reshape by DataFrame.unstack , add all missing datetimes by DataFrame.asfreq , then reshape back with DataFrame.stack , remove counter level by first DataFrame.reset_index and by second convert effective_date to column: Use GroupBy.cumcount for counter, which is used fo MultiIndex by DataFrame.set_index , reshape by DataFrame.unstack , add all missing datetimes by DataFrame.asfreq , then reshape back with DataFrame.stack , remove counter level by first DataFrame.reset_index and by第二个将effective_date日期转换为列:

df['effective_date'] = pd.to_datetime(df['effective_date'])

df1 = (df.set_index(['effective_date',df.groupby('effective_date').cumcount()])
         .unstack()
         .asfreq('D', method='ffill')
         .stack()
         .reset_index(level=1, drop=True)
         .reset_index())

print (df1)
   effective_date       ent_id
0      2020-02-03     349114.0
1      2020-02-03    1559910.0
2      2020-02-03   23431626.0
3      2020-02-03   15747736.0
4      2020-02-04   21349114.0
5      2020-02-04   15559910.0
6      2020-02-04  323431626.0
7      2020-02-04    5747736.0
8      2020-02-05   76349114.0
9      2020-02-05    5459910.0
10     2020-02-05   89431626.0
11     2020-02-05   37747736.0
12     2020-02-06   10349114.0
13     2020-02-06   26559910.0
14     2020-02-06   35431626.0
15     2020-02-06   88747736.0
16     2020-02-07   34913414.0
17     2020-02-07   15591910.0
18     2020-02-07  234318626.0
19     2020-02-07    1574436.0
20     2020-02-08   34913414.0
21     2020-02-08   15591910.0
22     2020-02-08  234318626.0
23     2020-02-08    1574436.0
24     2020-02-09   34913414.0
25     2020-02-09   15591910.0
26     2020-02-09  234318626.0
27     2020-02-09    1574436.0
28     2020-02-10     139114.0
29     2020-02-10    4359910.0
30     2020-02-10   43431626.0
31     2020-02-10   10947736.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM