I have a table table_a
:
event_id event_start process_id process_start name country
A1 2020-07-01 21:19:01 B1 2020-07-01 21:20:05 google US
A1 2020-07-01 21:19:01 B2 2020-07-01 21:21:01 google US
A1 2020-07-01 21:19:01 B3 2020-07-01 21:23:04 google US
A4 2020-07-01 14:59:12 C1 2020-07-01 15:01:14 bing UK
A5 2020-07-01 12:39:14 D1 2020-07-01 12:49:13 bing CA
A6 2020-07-01 11:49:46 E1 2020-07-01 11:52:59 facebook US
In this table I have event_id
which can be the same, if event_id
is the same, then event_start
will be the same too. process_id
is unique, process_start can have duplicate. I am trying to calculate the minute difference between event_start
and process_start
for each event_id
, the problem is that event_start
has the same time but the process_start
can have different timestamps. I would like to take 2 times from the process_start
if they have more than one. First would be the earliest (min) process_start
and the latest (max) process_start
so that my desired output would look like so:
event_id event_start process_id process_start name country earliest_diff_minute latest_diff_minute
A1 2020-07-01 21:19:01 B1 2020-07-01 21:20:05 google US 1 (2020-07-01 21:20:05 - 2020-07-01 21:19:01) 3 (2020-07-01 21:23:05 - 2020-07-01 21:19:04)
A1 2020-07-01 21:19:01 B3 2020-07-01 21:23:04 google US 1 (2020-07-01 21:20:05 - 2020-07-01 21:19:01) 3 (2020-07-01 21:23:05 - 2020-07-01 21:19:04)
A4 2020-07-01 14:59:12 C1 2020-07-01 15:01:14 bing UK 2 ( 2020-07-01 15:01:14 - 2020-07-01 14:59:12) 2 ( 2020-07-01 15:01:14 - 2020-07-01 14:59:12)
A5 2020-07-01 12:39:14 D1 2020-07-01 12:49:13 bing CA 10 10
A6 2020-07-01 11:49:46 E1 2020-07-01 11:52:59 facebook US 3 3
So if the process_id
is unique, the min and max time difference will be the same. If more than 1, both max and min values are recorded while everything in between is discarded.
I assume that duplicates is by name
and country
. You can just use window functions, particularly min()
and max()
to get the earliest and latest process dates for each grouping:
select a.*,
datediff('m', event_start, max(process_start) over (partition by event_id, name, country)),
datediff('m', event_start, min(process_start) over (partition by event_id, name, country))
from table_a a
I think there are various ways to achieve your goal, this is the first that I thought
SELECT event_id,
event_start,
process_id,
process_start,
name,
country,
datediff('m', event_start, first_process_start) as earliest_diff_minute,
datediff('m', last_process_start, event_start) as latest_diff_minute
FROM (
SELECT event_id,
event_start,
process_id,
process_start,
name,
country
first_value(process_start)
over (partition by event_id
order by process_start
rows between unbounded preceding and unbounded following) as first_process_start,
last_value(process_start)
over (partition by event_id
order by process_start
rows between unbounded preceding and unbounded following) as last_process_start
FROM my_schema.my_table) as a
WHERE process_start = first_process_start
OR process_start = last_process_start
Basically in the sub query for each row you retrieve the first and last process_start. Then you retrieve only the rows that have the process_start equal to one of each and after you can calculate the date difference.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.