[英]PANDAS (poputating datetime and ffill() the data in dataframe in pandas)
I have a dataframe as below, it only have business days data(ie excluding weekends)
我有一个 dataframe 如下,它只有工作日数据(即不包括周末)
effective_date,ent_id
2020-02-03,349114.0
2020-02-03,1559910.0
2020-02-03,23431626.0
2020-02-03,15747736.0
2020-02-04,21349114.0
2020-02-04,15559910.0
2020-02-04,323431626.0
2020-02-04,5747736.0
2020-02-05,76349114.0
2020-02-05,5459910.0
2020-02-05,89431626.0
2020-02-05,37747736.0
2020-02-06,10349114.0
2020-02-06,26559910.0
2020-02-06,35431626.0
2020-02-06,88747736.0
2020-02-07,34913414.0
2020-02-07,15591910.0
2020-02-07,234318626.0
2020-02-07,1574436.0
2020-02-10,139114.0
2020-02-10,4359910.0
2020-02-10,43431626.0
2020-02-10,10947736.0
i have to perform two action on this dataframe 1) is populate the missing timeseries ie 2020-02-08 and 2020-02-09 2) is to forward fill the entire block of data from preceding date
我必须对此 dataframe 执行两个操作 1) 填充缺失的时间序列,即 2020-02-08 和 2020-02-09 2) 是从前一个日期向前填充整个数据块
so my output dataframe should be like this
所以我的 output dataframe 应该是这样的
effective_date,ent_id
> 2020-02-03,349114.0 2020-02-03,1559910.0 2020-02-03,23431626.0
> 2020-02-03,15747736.0 2020-02-04,21349114.0 2020-02-04,15559910.0
> 2020-02-04,323431626.0 2020-02-04,5747736.0 2020-02-05,76349114.0
> 2020-02-05,5459910.0 2020-02-05,89431626.0 2020-02-05,37747736.0
> 2020-02-06,10349114.0 2020-02-06,26559910.0 2020-02-06,35431626.0
> 2020-02-06,88747736.0 2020-02-07,34913414.0 2020-02-07,15591910.0
> 2020-02-07,234318626.0 2020-02-07,1574436.0 2020-02-08,34913414.0
> 2020-02-08,15591910.0 2020-02-08,234318626.0 2020-02-08,1574436.0
> 2020-02-09,34913414.0 2020-02-09,15591910.0 2020-02-09,234318626.0
> 2020-02-09,1574436.0 2020-02-10,139114.0 2020-02-10,4359910.0
> 2020-02-10,43431626.0 2020-02-10,10947736.0
Use GroupBy.cumcount
for counter, which is used fo MultiIndex
by DataFrame.set_index
, reshape by DataFrame.unstack
, add all missing datetimes by DataFrame.asfreq
, then reshape back with DataFrame.stack
, remove counter level by first DataFrame.reset_index
and by second convert effective_date
to column: Use
GroupBy.cumcount
for counter, which is used fo MultiIndex
by DataFrame.set_index
, reshape by DataFrame.unstack
, add all missing datetimes by DataFrame.asfreq
, then reshape back with DataFrame.stack
, remove counter level by first DataFrame.reset_index
and by第二个将effective_date
日期转换为列:
df['effective_date'] = pd.to_datetime(df['effective_date'])
df1 = (df.set_index(['effective_date',df.groupby('effective_date').cumcount()])
.unstack()
.asfreq('D', method='ffill')
.stack()
.reset_index(level=1, drop=True)
.reset_index())
print (df1)
effective_date ent_id
0 2020-02-03 349114.0
1 2020-02-03 1559910.0
2 2020-02-03 23431626.0
3 2020-02-03 15747736.0
4 2020-02-04 21349114.0
5 2020-02-04 15559910.0
6 2020-02-04 323431626.0
7 2020-02-04 5747736.0
8 2020-02-05 76349114.0
9 2020-02-05 5459910.0
10 2020-02-05 89431626.0
11 2020-02-05 37747736.0
12 2020-02-06 10349114.0
13 2020-02-06 26559910.0
14 2020-02-06 35431626.0
15 2020-02-06 88747736.0
16 2020-02-07 34913414.0
17 2020-02-07 15591910.0
18 2020-02-07 234318626.0
19 2020-02-07 1574436.0
20 2020-02-08 34913414.0
21 2020-02-08 15591910.0
22 2020-02-08 234318626.0
23 2020-02-08 1574436.0
24 2020-02-09 34913414.0
25 2020-02-09 15591910.0
26 2020-02-09 234318626.0
27 2020-02-09 1574436.0
28 2020-02-10 139114.0
29 2020-02-10 4359910.0
30 2020-02-10 43431626.0
31 2020-02-10 10947736.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.