将不规则时间间隔的Pandas数据框切成天边界

Question

I have a data frame that looks like the following: 我有一个数据框架，如下所示：

import pandas as pd
x = pd.DataFrame({'start_time': ['2012-01 23:00', '2012-02 02:00', '2012-02 05:00'], 'end_time': ['2012-02 02:00', '2012-02 05:00', '2012-02 9:00'], 'count': [3, 5, 1]})

'''
start_time,end_time,count
2012-01 23:00,2012-02 02:00,3
2012-02 01:00,2012-02 05:00,5
'''

For example, the first row might represent the fact that there were 3 sales between Jan 1 11p - Jan 2 1a. 例如，第一行可能表示一个事实，在1月1日1p至1月2 1a之间有3笔交易。

These time intervals cross day boundaries, but I want to be able to get a rough estimate of how many sales there were per day. 这些时间间隔跨越了一天的界限，但我希望能够对每天的销售量有一个大概的估计。 So in the example above, I want the row representing 3 sales between 11p-2a to be divided into two rows: 因此，在上面的示例中，我希望将表示11p-2a之间的3笔销售的行分为两行：

One row from 11p-midnight, with 1 sale. 午夜11点至一排，有1笔交易。 (because there were originally 3 hours for 3 sales, and now there's only 1 hour, so 1/3 * 3 = 1) （因为最初3个小时有3笔交易，而现在只有1个小时，所以1/3 * 3 = 1）
Another row from midnight-2a, with 2 sales. 午夜2a的另一排，有2笔销售。

Is there an easy way to do this? 是否有捷径可寻？

Answer 1

I couldn't think of a nice way to vectorize the answer, but here's a hack that gets the basic logic. 我想不出一种向量化答案的好方法，但是这里有个可以理解基本逻辑的技巧。 There's surely a way to generate something cleaner than this, but maybe this is all you need. 当然，有一种方法可以生成比这更干净的东西，但是也许这就是您所需要的。

x = pd.DataFrame({'start_time': ['2012-01-01 23:00', '2012-01-03 02:00', '2012-01-04 22:00'], 
                  'end_time': ['2012-01-02 02:00', '2012-01-03 05:00', '2012-01-05 2:00'], 
                  'count': [3, 5, 1]})
x['start_time'] = pd.to_datetime(x['start_time'])
x['end_time'] = pd.to_datetime(x['end_time'])

from collections import Counter
strip_time = lambda x: pd.datetime(x.year, x.month, x.day)

c = Counter()
for _, row in x.iterrows():
    if row['start_time'].day == row['end_time'].day:
        c[strip_time(row['start_time'])] += row['count']
    else:
        delta_t = row['end_time'] - row['start_time']
        c[strip_time(row['start_time'])] += row['count'] * (strip_time(row['end_time']) - row['start_time'])/delta_t
        c[strip_time(row['end_time'])] += row['count'] * (row['end_time'] - strip_time(row['end_time']))/delta_t

s = pd.Series(c)

# s:
2012-01-01    1.0
2012-01-02    2.0
2012-01-03    5.0
2012-01-04    0.5
2012-01-05    0.5
dtype: float64

将不规则时间间隔的Pandas数据框切成天边界

问题描述

1 个解决方案

解决方案1
1 2015-08-27 03:28:11

将不规则时间间隔的Pandas数据框切成天边界

问题描述

1 个解决方案

解决方案1 1 2015-08-27 03:28:11

解决方案1
1 2015-08-27 03:28:11