[英]Chopping up a Pandas data frame of irregular time intervals into day boundaries
I have a data frame that looks like the following: 我有一个数据框架,如下所示:
import pandas as pd
x = pd.DataFrame({'start_time': ['2012-01 23:00', '2012-02 02:00', '2012-02 05:00'], 'end_time': ['2012-02 02:00', '2012-02 05:00', '2012-02 9:00'], 'count': [3, 5, 1]})
'''
start_time,end_time,count
2012-01 23:00,2012-02 02:00,3
2012-02 01:00,2012-02 05:00,5
'''
For example, the first row might represent the fact that there were 3 sales between Jan 1 11p - Jan 2 1a. 例如,第一行可能表示一个事实,在1月1日1p至1月2 1a之间有3笔交易。
These time intervals cross day boundaries, but I want to be able to get a rough estimate of how many sales there were per day. 这些时间间隔跨越了一天的界限,但我希望能够对每天的销售量有一个大概的估计。 So in the example above, I want the row representing 3 sales between 11p-2a to be divided into two rows:
因此,在上面的示例中,我希望将表示11p-2a之间的3笔销售的行分为两行:
Is there an easy way to do this? 是否有捷径可寻?
I couldn't think of a nice way to vectorize the answer, but here's a hack that gets the basic logic. 我想不出一种向量化答案的好方法,但是这里有个可以理解基本逻辑的技巧。 There's surely a way to generate something cleaner than this, but maybe this is all you need.
当然,有一种方法可以生成比这更干净的东西,但是也许这就是您所需要的。
x = pd.DataFrame({'start_time': ['2012-01-01 23:00', '2012-01-03 02:00', '2012-01-04 22:00'],
'end_time': ['2012-01-02 02:00', '2012-01-03 05:00', '2012-01-05 2:00'],
'count': [3, 5, 1]})
x['start_time'] = pd.to_datetime(x['start_time'])
x['end_time'] = pd.to_datetime(x['end_time'])
from collections import Counter
strip_time = lambda x: pd.datetime(x.year, x.month, x.day)
c = Counter()
for _, row in x.iterrows():
if row['start_time'].day == row['end_time'].day:
c[strip_time(row['start_time'])] += row['count']
else:
delta_t = row['end_time'] - row['start_time']
c[strip_time(row['start_time'])] += row['count'] * (strip_time(row['end_time']) - row['start_time'])/delta_t
c[strip_time(row['end_time'])] += row['count'] * (row['end_time'] - strip_time(row['end_time']))/delta_t
s = pd.Series(c)
# s:
2012-01-01 1.0
2012-01-02 2.0
2012-01-03 5.0
2012-01-04 0.5
2012-01-05 0.5
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.