简体   繁体   English

Pandas Grouper 计算事件之间经过的时间

[英]Pandas Grouper calculate time elapsed between events

I am trying to find the time elapsed between two events using Grouper but was unable to do so.我试图找到使用 Grouper 的两个事件之间经过的时间,但无法这样做。 Please help me out.请帮帮我。 Below is the i/p & expected o/p以下是 i/p 和预期的 o/p

Input输入

ID   Status           Datetime
A    Online     24/09/2017  7:00:00 AM
A    Offline    24/09/2017  7:30:00 AM     
A    Offline    24/09/2017  8:30:00 AM
A    Online     24/09/2017  9:30:00 AM
A    Offline    24/09/2017  10:00:00 AM
B    Offline    24/09/2017  6:00:00 AM
B    Online     24/09/2017  7:30:00 AM     
B    Online     24/09/2017  9:10:00 AM
B    Offline    24/09/2017  9:30:00 AM
B    Online     24/09/2017  9:40:00 AM
B    Offline    24/09/2017  10:00:00 AM

Output输出

ID        Hour_start                  Hour_end              Online_time
A    24/09/2017  7:00:00 AM     24/09/2017  8:00:00 AM          1800
A    24/09/2017  8:00:00 AM     24/09/2017  9:00:00 AM           0
A    24/09/2017  9:00:00 AM     24/09/2017  10:00:00 AM         1800
B    24/09/2017  6:00:00 AM     24/09/2017  7:00:00 AM           0
B    24/09/2017  7:00:00 AM     24/09/2017  8:00:00 AM          1800
B    24/09/2017  8:00:00 AM     24/09/2017  9:00:00 AM          3600
B    24/09/2017  9:00:00 AM     24/09/2017  10:00:00 AM         3000

Using Pandas Grouper使用熊猫石斑鱼

df_output = df.groupby(['ID',pd.Grouper(key='Datetime', freq='H'),'status'])['event_time'].diff().dt.seconds.fillna(0)

But this doesn't take into the condition of Online & Offline of the Status column但这并没有考虑到状态栏的在线和离线情况

Please help me out.请帮帮我。 TIA TIA

I assume that Datetime column in your source DataFrame is of datetime64 type.我假设源 DataFrame 中的Datetime列是datetime64类型。

My solution is based on 2-level grouping, first by ID and then (after some intermediate operations) by hour.我的解决方案基于 2 级分组,首先按ID ,然后(在一些中间操作之后)按小时。

Define 2 functions:定义2个函数:

  1. onTimeById , to compute Online time for each ID (the "external" grouping level): onTimeById ,计算每个ID 的在线时间(“外部”分组级别):

     def onTimeById(grp): wrk = grp[grp.Status != grp.Status.shift()] wrk = wrk.set_index('Datetime').Status wrk = wrk.reindex(wrk.index.union(pd.date_range(wrk.index.min(), wrk.index.max(), freq='H'))).ffill() res = wrk.groupby(pd.Grouper(freq='H')).apply(onTimeByHour) rv = res.iloc[:-1].reset_index().rename(columns={'index': 'Hour_start', 'Status': 'Online_time'}) rv.insert(1, 'Hour_end', res.index[1:]) return rv
  2. onTimeByHour , to compute Online time for each hour (the "internal" grouping level): onTimeByHour ,计算每小时的在线时间(“内部”分组级别):

     def onTimeByHour(grp2): if grp2.size > 1: dd = grp2.index.to_series().diff() rv = dd[grp2 == 'Offline'].sum().seconds if grp2.iloc[-1] == 'Online': rv += 3600 - dd.sum().seconds return rv return 0 if grp2.iloc[0] == 'Offline' else 3600

Then run:然后运行:

res = df.groupby('ID').apply(onTimeById).reset_index(level=0).reset_index(drop=True)

The result, for your source data, is:结果,对于您的源数据,是:

  ID          Hour_start            Hour_end  Online_time
0  A 2017-09-24 07:00:00 2017-09-24 08:00:00         1800
1  A 2017-09-24 08:00:00 2017-09-24 09:00:00            0
2  A 2017-09-24 09:00:00 2017-09-24 10:00:00         1800
3  B 2017-09-24 06:00:00 2017-09-24 07:00:00            0
4  B 2017-09-24 07:00:00 2017-09-24 08:00:00         1800
5  B 2017-09-24 08:00:00 2017-09-24 09:00:00         3600
6  B 2017-09-24 09:00:00 2017-09-24 10:00:00         3000

To understand all details how this solution works, save group "A" under a variable, eg running:要了解此解决方案如何工作的所有详细信息,请将组“A”保存在变量下,例如运行:

gr = df.groupby('ID')
grp = gr.get_group('A')

Then execute each instruction from onTimeById and see the results.然后从onTimeById执行每条指令并查看结果。

Apply the same approach to trace how onTimeByHour works.应用相同的方法来跟踪onTimeByHour 的工作方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM