简体   繁体   English

在Postgresql上配对顺序事件

[英]Pairing Sequential Events on Postgresql

We are logging the main flows of actions our users make on our iPad app on a table. 我们正在记录用户在桌面上的iPad应用程序上执行的主要操作流程。 Each flow has a start(tagged Started) and an end that is either tagged Cancelled or Finished, and there shouldn't be any overlapping events. 每个流都有一个开始(标记为已启动)和一个标记为已取消或已完成的结束,并且不应存在任何重叠事件。

A set of flows Started, Cancelled or Finished for a user looks like this: 为用户启动,取消或完成的一组流程如下所示:

user_id             timestamp                   event_text      event_num
info@cafe-test.de   2016-10-30 00:08:00.966+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:08:15.58+00   Flow Cancelled  2
info@cafe-test.de   2016-10-30 00:08:15.581+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:34:44.134+00  Flow Finished   1
info@cafe-test.de   2016-10-30 00:42:26.102+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:42:49.276+00  Flow Cancelled  2
info@cafe-test.de   2016-10-30 00:42:49.277+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:59:47.337+00  Flow Cancelled  2
info@cafe-test.de   2016-10-30 00:59:47.337+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:59:47.928+00  Flow Cancelled  2

We want to calculate how long a cancelled and finished flow last on average. 我们想要计算取消流量和完成流量的平均持续时间。 For this we need to pair event Started with Canceled or Finished. 为此,我们需要将事件Started与Cancelled或Finished配对。 The following code does that, however can't work around the following data quality issue that we have: 以下代码执行此操作,但无法解决我们遇到的以下数据质量问题:

  • When a customer wants to start a new flow(let's call it Flow2) before ending the ongoing flow (Flow1), we shoot a cancelled event as we shoot the started event for the new flow. 当客户想要在结束正在进行的流程(Flow1)之前启动新流程(让我们称之为Flow2)时,我们会在拍摄新流程的已启动事件时拍摄已取消的事件。 So Flow1 Cancelled=Flow2 Started . 所以Flow1 Cancelled=Flow2 Started However when we use window functions to order and lead/lag between ordered events that actually belong to different flows get matched. 但是,当我们使用窗口函数进行排序时,实际属于不同流的有序事件之间的超前/滞后得到匹配。 By using this code: 通过使用此代码:

     WITH track_scf AS (SELECT user_id, timestamp, event_text, CASE WHEN event_text LIKE '%Started%' THEN 0 when event_text like '%Cancelled%' then 2 ELSE 1 END AS event_num FROM tracks ORDER BY 2, 4 desc ) SELECT user_id, CASE WHEN event_num=0 then timestamp end as start,CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(timestamp, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as end, CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as action FROM track_scf 

We get this result: 我们得到这个结果:

user_id             start                       end                         action
info@cafe-test.de   2016-10-30 00:08:00.966+00  2016-10-30 00:08:15.58+00   2
info@cafe-test.de   2016-10-30 00:08:15.581+00  2016-10-30 00:34:44.134+00  1
info@cafe-test.de   2016-10-30 00:42:26.102+00  2016-10-30 00:42:49.276+00  2
info@cafe-test.de   2016-10-30 00:42:49.277+00  NULL                        NULL
info@cafe-test.de   2016-10-30 00:59:47.337+00  2016-10-30 00:59:47.337+00  2
info@cafe-test.de   NULL                        2016-10-30 00:59:47.928+00  2

But we should get this: 但我们应该得到这个:

user_id             start                       end                         action
info@cafe-test.de   2016-10-30 00:08:00.966+00  2016-10-30 00:08:15.58+00   2
info@cafe-test.de   2016-10-30 00:08:15.581+00  2016-10-30 00:34:44.134+00  1
info@cafe-test.de   2016-10-30 00:42:26.102+00  2016-10-30 00:42:49.276+00  2
info@cafe-test.de   2016-10-30 00:42:49.277+00  2016-10-30 00:59:47.337+00  2
info@cafe-test.de   2016-10-30 00:59:47.337+00  2016-10-30 00:59:47.928+00  2

How do I need to alter the code so that the pairing is correct? 如何更改代码以使配对正确?

select      user_id       
           ,"start"                       
           ,"end"                         
           ,"action"

from       (select      user_id
                       ,timestamp                 as "start"
                       ,lead (event_num)   over w as "action"
                       ,lead ("timestamp") over w as "end"
                       ,event_num

            from        tracks t

            window      w as (partition by user_id order by "timestamp",event_num desc)
            ) t

where       t.event_num = 0
;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM