[英]Merge every 2 consecutive records into 1
I have a pre-processed table which I want to group every pair into one record containing data from fields of both records.我有一个预处理表,我想将每一对组合成一个记录,其中包含来自两个记录字段的数据。
|-------------------|-----|----|
|Timestamp |Event|User|
|-------------------|-----|----|
|17/03/2020 03:22:00|Start|1 |
|17/03/2020 03:22:05|End |1 |
|17/03/2020 03:22:10|Start|2 |
|17/03/2020 03:22:15|End |2 |
|17/03/2020 03:23:00|Start|1 |
|17/03/2020 03:23:22|End |1 |
|-------------------|-----|----|
The query should return:查询应返回:
|-------------------|-------------------|----|
|StartTimestamp |EndTimestamp |User|
|-------------------|-------------------|----|
|17/03/2020 03:22:00|17/03/2020 03:22:05|1 |
|17/03/2020 03:22:10|17/03/2020 03:22:15|2 |
|17/03/2020 03:23:00|17/03/2020 03:23:22|1 |
|-------------------|-------------------|----|
You can safely assume that every 2 records is the correct pair (events are Start and End respectively, and User is the same) since the table is pre-filtered.您可以放心地假设每 2 条记录都是正确的对(事件分别是 Start 和 End,并且 User 是相同的),因为该表已预先过滤。
EDIT: Sorry, I forgot to mention that having multiple pairs for a single user is allowed.编辑:抱歉,我忘了提到允许单个用户拥有多对。 I've adjusted the example table above to show that.我已经调整了上面的示例表以显示这一点。
As suggested, this should do what you want :正如所建议的,这应该做你想做的:
SELECT
MIN(Timestamp) AS StartTimestamp,
MAX(Timestamp) AS EndTimestamp,
User
FROM
mytable
GROUP BY User;
EDIT : As a user id can appear multiple times, in multiple groups, see the following query :编辑:由于用户 ID 可以在多个组中多次出现,请参阅以下查询:
WITH cte AS (
SELECT mt.*, ROW_NUMBER() OVER(ORDER BY time) AS rn FROM mytable mt
)
SELECT
t1.userid,
t1.time AS StartTimestamp,
t2.time AS EndTimestamp
FROM cte t1
JOIN cte t2 ON t1.rn+1 = t2.rn
WHERE t1.event = 'Start'
You can use row_number()
& do conditional aggregation :您可以使用row_number()
并进行条件聚合:
select user,
min(case when event = 'Start' then timestamp end) as starttimestamp,
min(case when event = 'End' then timestamp end) as endtimestamp
from (select t.*,
row_number() over (partition by user, event order by timestamp) as seq
from table t
) t
group by user, seq;
I would suggest using lead()
or a cumulative min()
:我建议使用lead()
或累积min()
:
select t.*
from (select t.*,
min(case when event = 'End' then timestamp end) over (partition by user order by timestamp desc) as end_time
from t
) t
where event = 'Start';
Number the rows per user and event to get to event numbers.对每个用户和事件的行进行编号以获取事件编号。 Then join event starts with event ends.然后加入事件开始事件结束。
with s as
(
select
[user], timestamp,
row_number() over (partition by [user] order by timestamp) as event_number
from mytable
where event = 'Start'
)
, e as
(
select
[user], timestamp,
row_number() over (partition by [user] order by timestamp) as event_number
from mytable
where event = 'End'
)
select s.[user], s.timestamp as start_time, e.timestamp as end_time
from s
join e on e.[user] = s.[user] and e.event_number = s.event_number
order by start_time;
Use a left outer join, if you want to show events that have started but not ended yet.如果要显示已开始但尚未结束的事件,请使用左外连接。
This query also allows for parallel events (ie a user starts an event, then another user starts an event before the first user ends theirs).该查询还允许并行事件(即,一个用户开始一个事件,然后另一个用户在第一个用户结束他们的事件之前开始一个事件)。
What the query doesn't account for are missing events, eg a user starts an event, but when they end it, it's not recorded in the table.查询没有考虑到丢失的事件,例如用户开始一个事件,但是当他们结束它时,它没有记录在表中。 Then the user starts a new event and end it and my query will relate the second event's end with the first event's start.然后用户开始一个新事件并结束它,我的查询会将第二个事件的结束与第一个事件的开始联系起来。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.