比较 Big Query 中 A 列具有不同值且 B 列具有相同值的连续两行中的时间戳

Question

guys, I have a big query result which shows me the time (in the column local_time ) that riders (in the column rider_id ) logout of an app (the column event ), so there are two distinct values for the column event , "authentication_complete" and "logout".伙计们，我有一个大的查询结果，这说明我的时间（在列local_time ）是骑手（列rider_id ）的应用程序（列注销event ），所以有列两个不同的值event ，“authentication_complete ”和“注销”。

event_date  rider_id    event                    local_time
20200329    100695      authentication_complete  20:07:09
20200329    100884      authentication_complete  12:00:51
20200329    100967      logout                   10:53:17
20200329    100967      authentication_complete  10:55:24
20200329    100967      logout                   11:03:28
20200329    100967      authentication_complete  11:03:47
20200329    101252      authentication_complete  7:55:21
20200329    101940      authentication_complete  8:58:44
20200329    101940      authentication_complete  17:19:57
20200329    102015      authentication_complete  14:20:27
20200329    102015      logout                   22:47:50
20200329    102015      authentication_complete  22:48:34

what I want to achieve is for each rider who ever logged out, in one column I want to get the time they logged out, and in another column I want to get the time for the event "authentication_complete" that comes right after that logout event for that rider.我想要实现的是对于曾经注销的每个骑手，在一个列中我想获得他们注销的时间，在另一列中我想获得该注销事件之后发生的“authentication_complete”事件的时间对于那个骑手。 In this way, I can see the time period that each rider was out of the app.通过这种方式，我可以看到每个骑手离开应用程序的时间段。 the query result I want to get will look like below.我想得到的查询结果如下所示。

event_date  rider_id    time_of_logout  authentication_complete_right_after_the_logout
20200329    100967      10:53:17        10:55:24
20200329    100967      11:03:28        11:03:47
20200329    102015      22:47:50        22:48:34

This was a very unclean data set, and so far I was able to clean this much, but at this step, I am feeling very stuck.这是一个非常不干净的数据集，到目前为止我能够清理这么多，但是在这一步，我感觉很卡。 I was looking into functions like lag() but since the data is 180,000 rows, there can be multiple events named "logout" for a rider_id and there are multiple consecutive events named "authentication_complete" for the same rider_id, it is extra confusing.我正在研究诸如lag()类的函数，但由于数据是 180,000 行，因此rider_id 可能有多个名为“logout”的事件，并且同一个rider_id 有多个名为“authentication_complete”的连续事件，这更加令人困惑。 I would really appreciate any help.我真的很感激任何帮助。 Thanks!谢谢！

Answer 1

I think you want lead() :我想你想要lead() ：

select event_date, rider_id, date, local_time as logout_date,
       authentication_date
from (select t.*,
             lead(local_time) over (partition by event_date, rider_id order by local_time) as authentication_date
      from t
     ) t
where event = 'logout';

This assumes that the next event is indeed an authentication, as in your sample data.这假设下一个事件确实是身份验证，如您的示例数据中所示。 You don't specify what to do if this is not the case.如果不是这种情况，您不会指定要做什么。

If you specifically want the next authentication date, then you can use a min() :如果您特别想要下一个身份验证日期，那么您可以使用min() ：

select event_date, rider_id, date, local_time as logout_date,
       authentication_date
from (select t.*,
             min(case when event = 'authentication_complete' then local_time end) over (partition by event_date, rider_id order by local_time desc) as authentication_date
      from t
     ) t
where event = 'logout';

比较 Big Query 中 A 列具有不同值且 B 列具有相同值的连续两行中的时间戳

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-03-31 21:53:49

比较 Big Query 中 A 列具有不同值且 B 列具有相同值的连续两行中的时间戳

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-03-31 21:53:49

解决方案1
0 已采纳 2020-03-31 21:53:49