[英]Creating a DataFrame with a row for each date from date range in other DataFrame
Below is script for a simplified version of the df in question:以下是相关 df 的简化版本的脚本:
plan_dates=pd.DataFrame({'id':[1,2,3,4,5],
'start_date':['2021-01-01','2021-01-01','2021-01-03','2021-01-04','2021-01-05'],
'end_date': ['2021-01-04','2021-01-03','2021-01-03','2021-01-06','2021-01-08']})
plan_dates
id start_date end_date
0 1 2021-01-01 2021-01-04
1 2 2021-01-01 2021-01-03
2 3 2021-01-03 2021-01-03
3 4 2021-01-04 2021-01-06
4 5 2021-01-05 2021-01-08
I would like to create a new DataFrame with a row for each day where the plan is active , for each id
.我想为每个
id
创建一个新的 DataFrame ,其中计划有效的每一天都有一行。
INTENDED DF:预期的DF:
id active_days
0 1 2021-01-01
1 1 2021-01-02
2 1 2021-01-03
3 1 2021-01-04
4 2 2021-01-01
5 2 2021-01-02
6 2 2021-01-03
7 3 2021-01-03
8 4 2021-01-04
9 4 2021-01-05
10 4 2021-01-06
11 5 2021-01-05
12 5 2021-01-06
13 5 2021-01-07
14 5 2021-01-08
Any help would be greatly appreciated.任何帮助将不胜感激。
Use:利用:
#first part is same like https://stackoverflow.com/a/66869805/2901002
plan_dates['start_date'] = pd.to_datetime(plan_dates['start_date'])
plan_dates['end_date'] = pd.to_datetime(plan_dates['end_date']) + pd.Timedelta(1, unit='d')
s = plan_dates['end_date'].sub(plan_dates['start_date']).dt.days
df = plan_dates.loc[plan_dates.index.repeat(s)].copy()
counter = df.groupby(level=0).cumcount()
df['start_date'] = df['start_date'].add(pd.to_timedelta(counter, unit='d'))
Then remove end_date
column, rename
and create default index:然后删除
end_date
列, rename
并创建默认索引:
df = (df.drop('end_date', axis=1)
.rename(columns={'start_date':'active_days'})
.reset_index(drop=True))
print (df)
id active_days
0 1 2021-01-01
1 1 2021-01-02
2 1 2021-01-03
3 1 2021-01-04
4 2 2021-01-01
5 2 2021-01-02
6 2 2021-01-03
7 3 2021-01-03
8 4 2021-01-04
9 4 2021-01-05
10 4 2021-01-06
11 5 2021-01-05
12 5 2021-01-06
13 5 2021-01-07
14 5 2021-01-08
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.