[英]Time Difference between per person between consecutive rows
I have some data which (broadly speaking) consist of following fields:我有一些数据(广义上)由以下字段组成:
Person TaskID Start_time End_time
Alpha 1 'Wed, 18 Oct 2017 10:10:03 GMT' 'Wed. 18 Oct 2017 10:10:36 GMT'
Alpha 2 'Wed, 18 Oct 2017 10:11:16 GMT' 'Wed, 18 Oct 2017 10:11:28 GMT'
Beta 1 'Wed, 18 Oct 2017 10:12:03 GMT' 'Wed, 18 Oct 2017 10:12:49 GMT'
Alpha 3 'Wed, 18 Oct 2017 10:12:03 GMT' 'Wed, 18 Oct 2017 10:13:13 GMT'
Gamma 1 'Fri, 27 Oct 2017 22:57:12 GMT' 'Sat, 28 Oct 2017 02:00:54 GMT'
Beta 2 'Wed, 18 Oct 2017 10:13:40 GMT' 'Wed, 18 Oct 2017 10:14:03 GMT'
For this data, my required output is something like:对于这些数据,我需要的 output 类似于:
Person TaskID Time_between_attempts
Alpha 1 NULL ['Wed, 18 Oct 2017 10:10:03 GMT' - NULL]
Alpha 2 0:00:40 ['Wed, 18 Oct 2017 10:11:16 GMT' -'Wed, 18 Oct 2017 10:10:36 GMT']
Beta 1 NULL ['Wed, 18 Oct 2017 10:12:03 GMT' - NULL]
Alpha 3 0:00:35 ['Wed, 18 Oct 2017 10:12:03 GMT' -'Wed, 18 Oct 2017 10:11:28 GMT']
Gamma 1 NULL ['Fri, 27 Oct 2017 22:57:12 GMT' - NULL]
Beta 2 0:00:51 ['Wed, 18 Oct 2017 10:13:40 GMT' -'Wed, 18 Oct 2017 10:12:49 GMT']
My requirements are as below:我的要求如下:
a.一个。 For a given person (Alpha, Beta or Gamma), the first occurrence of the variable 'time_between_attempts' would be zero/NULL - in the example I have shown it as NULL.对于给定的人(Alpha、Beta 或 Gamma),变量“time_between_attempts”的第一次出现将为零/NULL - 在示例中,我将其显示为 NULL。
b.湾。 The second (and the subsequent) times, the same person appears will have a non NULL or non-zero 'time_between_attempts'.第二次(以及随后)出现的同一个人将具有非 NULL 或非零的“time_between_attempts”。 This variable is calculated by taking the difference between the ending time of the previous task and the starting time of the next task.这个变量是通过取上一个任务的结束时间和下一个任务的开始时间之间的差来计算的。
I have following question in this regard:在这方面我有以下问题:
Please note that the TaskID is written as integer just for simplification.请注意,TaskID 写为 integer 只是为了简化。 In the original data, TaskID is complicated and consists of non-continuous strings as:在原始数据中,TaskID 很复杂,由不连续的字符串组成:
'q:1392763916495:441',
'q:1392763916495:436'
Any advice on this would be greatly appreciated.对此的任何建议将不胜感激。
This answers the original version of the question.这回答了问题的原始版本。
You can use lag()
and timestampdiff()
for the calculation.您可以使用lag()
和timestampdiff()
进行计算。 Assuming your value is a real date/time or timestamp, then you can easily calculate the value in seconds:假设您的值是真实的日期/时间或时间戳,那么您可以轻松地以秒为单位计算值:
select t.*,
timestampdiff(start_time,
lag(end_time) over (partition by person_id order by start_time)
seconds
)
from t;
If the values are stored as string, fix the data, In the meantime, you can use str_to_date()
in the function.如果值存储为字符串,请修复数据,同时,您可以使用 function 中的str_to_date()
。
To get this as a time value:要将其作为时间值:
select t.*,
(time(0) +
interval timestampdiff(start_time,
lag(end_time) over (partition by person_id order by start_time)
seconds
) second
)
from t;
Using self Join() method.使用 self Join() 方法。
SELECT a.person,
a.taskid,
TIMEDIFF (DATE_FORMAT(STR_TO_DATE(a.Start_time, '%a, %d %b %Y %H:%i:%s'), '%Y-%m-%d %H:%i:%s') ,DATE_FORMAT(STR_TO_DATE(b.End_time, '%a, %d %b %Y %H:%i:%s'), '%Y-%m-%d %H:%i:%s') ) as Time_between_attempts,
a.Start_time,
b.End_time
FROM test a
LEFT JOIN test b
ON a.person = b.person
AND a.taskid = b.taskid + 1
ORDER BY 1, 2;
But this will ignore timezone.但这将忽略时区。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.