简体   繁体   English

Pandas-基于重叠时间段的拆分数据集

[英]Pandas- Split dataset based on overlapping time periods

I have reporting time periods that start on Mondays, end on Sundays, and run for 5 weeks. 我报告的时间段从星期一开始,在星期日结束,持续5周。 For example: 例如:

11/20/2017 - 12/24/2017 = t1 
11/27/2017 - 12/31/2017 = t2

I have a dataframe that consists of 6 of these periods (starting 11/20/2017) and I'm trying to split it into 6 dataframes for each time period using the LeaveDate column. 我有一个由这些时间段中的6个组成的数据帧(从2017年11月20日开始),我正尝试使用LeaveDate列在每个时间段将其拆分为6个数据帧。 My data looks like this: 我的数据如下所示:

Barcode LeaveDate  
ABC123  2017-11-22 
ABC124  2017-12-04  
ABC125  2017-12-15

As the dataframe is separated, some of the barcodes will fall into multiple periods- that's OK. 由于数据帧是分开的,所以某些条形码会落入多个周期中-没关系。 I know I can do: 我知道我可以做:

df['period'] = df['LeaveDate'].dt.to_period('M-SUN')
df['week'] = df['period'].dt.week

To get single weeks, but I don't know how to definte a "multi-week" period. 要获得单周的时间,但我不知道如何定义“多周”的时间段。 The problem also is that a barcode can full under multiple periods, so they need to be outputted to multiple dataframes. 问题还在于,条形码可能在多个时期内充满,因此需要将其输出到多个数据帧。 Any ideas? 有任何想法吗? Thanks! 谢谢!

There might be a more succinct solution, but this should work (will give you a dictionary of DataFrames, one for each period): 可能有一个更简洁的解决方案,但这应该可以工作(将为您提供一个DataFrames字典,每个时期一个):

df = pd.DataFrame([['ABC123', '2017-11-22'],
                   ['ABC124',  '2017-12-04'],
                   ['ABC125',  '2017-12-15']],
                   columns=['Barcode', 'LeaveDate'])
periods = [('2017-11-20', '2017-12-24'), ('2017-11-27', '2017-12-31')]

results = {}
for period in periods:
    period_df = df[(df['LeaveDate'] >= period[0]) & (df['LeaveDate'] <= period[1])]
    results[period] = period_df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM