[英]Pandas Grouper calculate time elapsed between events
I am trying to find the time elapsed between two events using Grouper but was unable to do so.我试图找到使用 Grouper 的两个事件之间经过的时间,但无法这样做。 Please help me out.请帮帮我。 Below is the i/p & expected o/p以下是 i/p 和预期的 o/p
Input输入
ID Status Datetime
A Online 24/09/2017 7:00:00 AM
A Offline 24/09/2017 7:30:00 AM
A Offline 24/09/2017 8:30:00 AM
A Online 24/09/2017 9:30:00 AM
A Offline 24/09/2017 10:00:00 AM
B Offline 24/09/2017 6:00:00 AM
B Online 24/09/2017 7:30:00 AM
B Online 24/09/2017 9:10:00 AM
B Offline 24/09/2017 9:30:00 AM
B Online 24/09/2017 9:40:00 AM
B Offline 24/09/2017 10:00:00 AM
Output输出
ID Hour_start Hour_end Online_time
A 24/09/2017 7:00:00 AM 24/09/2017 8:00:00 AM 1800
A 24/09/2017 8:00:00 AM 24/09/2017 9:00:00 AM 0
A 24/09/2017 9:00:00 AM 24/09/2017 10:00:00 AM 1800
B 24/09/2017 6:00:00 AM 24/09/2017 7:00:00 AM 0
B 24/09/2017 7:00:00 AM 24/09/2017 8:00:00 AM 1800
B 24/09/2017 8:00:00 AM 24/09/2017 9:00:00 AM 3600
B 24/09/2017 9:00:00 AM 24/09/2017 10:00:00 AM 3000
Using Pandas Grouper使用熊猫石斑鱼
df_output = df.groupby(['ID',pd.Grouper(key='Datetime', freq='H'),'status'])['event_time'].diff().dt.seconds.fillna(0)
But this doesn't take into the condition of Online & Offline of the Status column但这并没有考虑到状态栏的在线和离线情况
Please help me out.请帮帮我。 TIA TIA
I assume that Datetime column in your source DataFrame is of datetime64 type.我假设源 DataFrame 中的Datetime列是datetime64类型。
My solution is based on 2-level grouping, first by ID and then (after some intermediate operations) by hour.我的解决方案基于 2 级分组,首先按ID ,然后(在一些中间操作之后)按小时。
Define 2 functions:定义2个函数:
onTimeById , to compute Online time for each ID (the "external" grouping level): onTimeById ,计算每个ID 的在线时间(“外部”分组级别):
def onTimeById(grp): wrk = grp[grp.Status != grp.Status.shift()] wrk = wrk.set_index('Datetime').Status wrk = wrk.reindex(wrk.index.union(pd.date_range(wrk.index.min(), wrk.index.max(), freq='H'))).ffill() res = wrk.groupby(pd.Grouper(freq='H')).apply(onTimeByHour) rv = res.iloc[:-1].reset_index().rename(columns={'index': 'Hour_start', 'Status': 'Online_time'}) rv.insert(1, 'Hour_end', res.index[1:]) return rv
onTimeByHour , to compute Online time for each hour (the "internal" grouping level): onTimeByHour ,计算每小时的在线时间(“内部”分组级别):
def onTimeByHour(grp2): if grp2.size > 1: dd = grp2.index.to_series().diff() rv = dd[grp2 == 'Offline'].sum().seconds if grp2.iloc[-1] == 'Online': rv += 3600 - dd.sum().seconds return rv return 0 if grp2.iloc[0] == 'Offline' else 3600
Then run:然后运行:
res = df.groupby('ID').apply(onTimeById).reset_index(level=0).reset_index(drop=True)
The result, for your source data, is:结果,对于您的源数据,是:
ID Hour_start Hour_end Online_time
0 A 2017-09-24 07:00:00 2017-09-24 08:00:00 1800
1 A 2017-09-24 08:00:00 2017-09-24 09:00:00 0
2 A 2017-09-24 09:00:00 2017-09-24 10:00:00 1800
3 B 2017-09-24 06:00:00 2017-09-24 07:00:00 0
4 B 2017-09-24 07:00:00 2017-09-24 08:00:00 1800
5 B 2017-09-24 08:00:00 2017-09-24 09:00:00 3600
6 B 2017-09-24 09:00:00 2017-09-24 10:00:00 3000
To understand all details how this solution works, save group "A" under a variable, eg running:要了解此解决方案如何工作的所有详细信息,请将组“A”保存在变量下,例如运行:
gr = df.groupby('ID')
grp = gr.get_group('A')
Then execute each instruction from onTimeById and see the results.然后从onTimeById执行每条指令并查看结果。
Apply the same approach to trace how onTimeByHour works.应用相同的方法来跟踪onTimeByHour 的工作方式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.