[英]comparing timestamps in two consecutive rows which have different values for column A and the same value for column B in Big Query
guys, I have a big query result which shows me the time (in the column local_time
) that riders (in the column rider_id
) logout of an app (the column event
), so there are two distinct values for the column event
, "authentication_complete" and "logout".伙计们,我有一个大的查询结果,这说明我的时间(在列
local_time
)是骑手(列rider_id
)的应用程序(列注销event
),所以有列两个不同的值event
,“authentication_complete ”和“注销”。
event_date rider_id event local_time
20200329 100695 authentication_complete 20:07:09
20200329 100884 authentication_complete 12:00:51
20200329 100967 logout 10:53:17
20200329 100967 authentication_complete 10:55:24
20200329 100967 logout 11:03:28
20200329 100967 authentication_complete 11:03:47
20200329 101252 authentication_complete 7:55:21
20200329 101940 authentication_complete 8:58:44
20200329 101940 authentication_complete 17:19:57
20200329 102015 authentication_complete 14:20:27
20200329 102015 logout 22:47:50
20200329 102015 authentication_complete 22:48:34
what I want to achieve is for each rider who ever logged out, in one column I want to get the time they logged out, and in another column I want to get the time for the event "authentication_complete" that comes right after that logout event for that rider.我想要实现的是对于曾经注销的每个骑手,在一个列中我想获得他们注销的时间,在另一列中我想获得该注销事件之后发生的“authentication_complete”事件的时间对于那个骑手。 In this way, I can see the time period that each rider was out of the app.
通过这种方式,我可以看到每个骑手离开应用程序的时间段。 the query result I want to get will look like below.
我想得到的查询结果如下所示。
event_date rider_id time_of_logout authentication_complete_right_after_the_logout
20200329 100967 10:53:17 10:55:24
20200329 100967 11:03:28 11:03:47
20200329 102015 22:47:50 22:48:34
This was a very unclean data set, and so far I was able to clean this much, but at this step, I am feeling very stuck.这是一个非常不干净的数据集,到目前为止我能够清理这么多,但是在这一步,我感觉很卡。 I was looking into functions like
lag()
but since the data is 180,000 rows, there can be multiple events named "logout" for a rider_id and there are multiple consecutive events named "authentication_complete" for the same rider_id, it is extra confusing.我正在研究诸如
lag()
类的函数,但由于数据是 180,000 行,因此rider_id 可能有多个名为“logout”的事件,并且同一个rider_id 有多个名为“authentication_complete”的连续事件,这更加令人困惑。 I would really appreciate any help.我真的很感激任何帮助。 Thanks!
谢谢!
I think you want lead()
:我想你想要
lead()
:
select event_date, rider_id, date, local_time as logout_date,
authentication_date
from (select t.*,
lead(local_time) over (partition by event_date, rider_id order by local_time) as authentication_date
from t
) t
where event = 'logout';
This assumes that the next event is indeed an authentication, as in your sample data.这假设下一个事件确实是身份验证,如您的示例数据中所示。 You don't specify what to do if this is not the case.
如果不是这种情况,您不会指定要做什么。
If you specifically want the next authentication date, then you can use a min()
:如果您特别想要下一个身份验证日期,那么您可以使用
min()
:
select event_date, rider_id, date, local_time as logout_date,
authentication_date
from (select t.*,
min(case when event = 'authentication_complete' then local_time end) over (partition by event_date, rider_id order by local_time desc) as authentication_date
from t
) t
where event = 'logout';
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.