简体   繁体   English

熊猫:标准化不规则时间间隔

[英]Pandas: Standardize irregular time intervals

I'm wondering if Pandas has some built-in functionality to take random time intervals (roughly hours) and convert them to standardized hours. 我想知道Pandas是否具有一些内置功能,以随机的时间间隔(大约几个小时)并将其转换为标准时间。 Code example and non-working attempt: 代码示例和不起作用的尝试:

import pandas as pd

df = pd.DataFrame({'start': ['2018-09-04 01:12', '2018-09-04 02:11'], 
                   'end'  : ['2018-09-04 02:10','2018-09-04 03:20'], 
                   'val'  : [500, 600]})[['start','end','val']]

df[['start','end']] = df[['start','end']].apply(pd.to_datetime)

Gives us: 给我们:

           start               end  val
2018-09-04 01:12  2018-09-04 02:10  500
2018-09-04 02:11  2018-09-04 03:20  600

and: 和:

df = df.resample('1H', on = 'start', ).reset_index()

would ideally (but doesn't) yield: 理想情况下(但不会)产生:

           start               end     val
2018-09-04 01:00  2018-09-04 01:59  406.78
2018-09-04 02:00  2018-09-04 02:59  513.22
2018-09-04 03:00  2018-09-04 03:59  180.00

I could code some hack to make this work, but figured Pandas would have some simple function that does this. 我可以编写一些hack程序来使其工作,但是Pandas会具有执行此任务的一些简单功能。

This is not a sufficiently common allotment to warrant its own method. 这不是足够普通的分配以保证其自身的方法。 You're doing a straightforward linear apportioning of each input interval, broken at the hour. 您正在对每个输入时间间隔进行简单的线性分配,该时间间隔是小时。 In the first interval, you have 59 total minutes recorded, so the "value" of each minute is 500/59 (8.47+). 在第一个间隔中,您总共记录了59分钟,因此每分钟的“值”是500/59(8.47+)。 For the second, it's 600/50 per minute (12.0). 第二个是每分钟600/50(12.0)。

You can do this with a relatively simple control structure, although the individual break-down is a little "wordy". 您可以使用相对简单的控制结构来完成此操作,尽管单个故障有些“麻烦”。 As you create the new rows, use the shift operator to address both the current and previous rows of the input data frame. 创建新行时,请使用shift运算符来寻址输入数据帧的当前行和先前行。 You need to keep track of the breakpoint (top of the hour) for each row and do that linear computation for both. 您需要跟踪每一行的断点(小时数),并对两者进行线性计算。 Your arithmetic looks something like 你的算术看起来像

TIME          VALUE
1:00 - 2:00   (1:12 - 1:00) * 0 + (2:00 - 1:12) * 500/59
2:00 - 3:00   (2:11 - 2:00) * 500/59 + (3:00 - 2:11) * 600/50
3:00 - 4:00   (3:20 - 3:00) * 600/50 + (4:00 - 3:20) * 0

Can you turn those details into the code you need? 您可以将这些详细信息转换为所需的代码吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM