[英]How to find the total number of events before the second login time
In BigQuery, I'm trying to find the total number of events before the second login time.在 BigQuery 中,我试图找到第二次登录时间之前的事件总数。
For different user ids, I have multiple events such as "scroll," "user engagement," "log in," "first_visit," "sign_up" etc. For simplicity, let's consider the above as all the events.对于不同的用户 ID,我有多个事件,例如“滚动”、“用户参与”、“登录”、“first_visit”、“sign_up”等。为简单起见,我们将以上视为所有事件。
For instance, for user_id 2, I have the following information extracted from the raw data (this is a snapshot of the table).例如,对于 user_id 2,我从原始数据中提取了以下信息(这是表的快照)。
User_id用户身份 | Event_name事件名称 | EventTime事件时间 |
---|---|---|
2 2个 | scroll滚动 | 2022-10-31 12:28:35 2022-10-31 12:28:35 |
2 2个 | sign_up报名 | 2022-10-29 08:11:29 2022-10-29 08:11:29 |
2 2个 | login登录 | 2022-11-01 16:46:34 2022-11-01 16:46:34 |
2 2个 | first_visit第一次访问 | 2022-10-30 10:45:22 2022-10-30 10:45:22 |
2 2个 | login登录 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
2 2个 | scroll滚动 | 2022-11-05 11:18:35 2022-11-05 11:18:35 |
2 2个 | user engagement用户参与 | 2022-11-06 08:45:17 2022-11-06 08:45:17 |
2 2个 | user engagement用户参与 | 2022-11-07 05:27:32 2022-11-07 05:27:32 |
First, I found the second login time for each user id.首先,我找到每个用户id的第二次登录时间。
WITH cte AS (
SELECT *, RANK() OVER (PARTITION BY User_id ORDER BY LoginTime) rnk
FROM MyData
)
SELECT User_id, LoginTime AS SecondLoginTime
FROM cte
WHERE rnk = 2
ORDER BY User_id;
User_id用户身份 | SecondLoginTime第二次登录时间 |
---|---|
1 1个 | 2022-11-07 09:52:27 2022-11-07 09:52:27 |
2 2个 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
I wanted to write a query where I can compare this login time to each event time (for all user ids)and count the events before the SecondLoginTime.我想编写一个查询,我可以将此登录时间与每个事件时间(对于所有用户 ID)进行比较,并计算 SecondLoginTime 之前的事件。
For instance, for user_id 2, I want to make the following comparison:例如,对于 user_id 2,我想做如下比较:
User_id用户身份 | Event_name事件名称 | EventTime事件时间 | SecondLoginTime第二次登录时间 |
---|---|---|---|
2 2个 | scroll滚动 | 2022-10-31 12:28:35 2022-10-31 12:28:35 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
2 2个 | sign_up报名 | 2022-10-29 08:11:29 2022-10-29 08:11:29 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
2 2个 | login登录 | 2022-11-01 16:46:34 2022-11-01 16:46:34 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
2 2个 | first_visit第一次访问 | 2022-10-30 10:45:22 2022-10-30 10:45:22 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
2 2个 | login登录 | 2022-11-04 08:10:38 2022-11-04 08:10:38 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
2 2个 | scroll滚动 | 2022-11-05 11:18:35 2022-11-05 11:18:35 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
2 2个 | user engagement用户参与 | 2022-11-06 08:45:17 2022-11-06 08:45:17 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
2 2个 | user engagement用户参与 | 2022-11-07 05:27:32 2022-11-07 05:27:32 | 2022-11-04 08:10:38 2022-11-04 08:10:38 |
And find the followig result:并找到以下结果:
User_id用户身份 | TotalEventsBeforeSecondVisit第二次访问前的总事件数 |
---|---|
2 2个 | 4 4个 |
I also want to apply this logic to all user ids.我还想将此逻辑应用于所有用户 ID。
Is there a way to do this?有没有办法做到这一点? Please kindly ask me for clarification if there is anything misssing or if the problem is unclear.如果有任何遗漏或问题不清楚,请问我澄清。 I'd really appreciate your help.非常感谢你的帮助。
You might consider below.您可以考虑以下。
WITH cte AS (
SELECT *,
NTH_VALUE(IF(Event_name = 'login', EventTime, NULL), 2 IGNORE NULLS) OVER w AS SecondLoginTime,
LAST_VALUE(IF(Event_name = 'login', EventTime, NULL) IGNORE NULLS) OVER w AS LastLoginTime
FROM MyData
WINDOW w AS (PARTITION BY User_id ORDER BY EventTime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
)
SELECT User_id,
COUNTIF(EventTime < SecondLoginTime) AS TotalEventsBeforeSecondLogin,
COUNTIF(EventTime < LastLoginTime) AS TotalEventsBeforeLastLogin,
FROM cte
GROUP BY 1;
Query results查询结果
This should work:这应该工作:
with LI2 AS (
SELECT User_Id, EventTime as LoginTime, ROW_NUMBER() OVER (PARTITION BY User_id ORDER BY EventTime) Seq
FROM MyTbl
where Event_Name='login'
)
SELECT
LI2.User_id
, LI2.LoginTime AS SecondLoginTime
, COUNT(OE.User_ID) as EventsBefore2ndLogin
FROM LI2
left join
MyTbl OE -- other events before 2nd login
on OE.User_Id=LI2.USer_Id
and OE.EventTime<LI2.LoginTime
WHERE LI2.Seq = 2
GROUP BY
LI2.User_id
, LI2.LoginTime
ORDER BY LI2.User_id
MyTbl is your table. MyTbl 是您的表。
Updated to accommodate 'Events before Last Login:更新以适应“上次登录前的事件:
with LogInSeq AS (
SELECT User_Id
, max(case when LoginSeq=2 then LoginTime else null end) as SecondLoginTime
, max(case when RevLoginSeq=1 then LoginTime else null end) as LastLoginTime
from (
SELECT User_Id, EventTime as LoginTime
, ROW_NUMBER() OVER (PARTITION BY User_id ORDER BY EventTime) LogInSeq
, ROW_NUMBER() OVER (PARTITION BY User_id ORDER BY EventTime desc) RevLoginSeq
FROM MyTbl
where Event_Name='login'
) LI1
where LoginSeq=2 or RevLoginSeq=1
group by User_Id
)
SELECT
LISeq.User_id
, LISeq.SecondLoginTime
, LISeq.LastLoginTime
, COUNT(case when OE.EventTime < LISeq.SecondLoginTime then OE.User_id else null end) as EventsBefore2ndLogin
, COUNT(case when OE.EventTime < LISeq.LastLoginTime then OE.User_id else null end) as EventsBeforeLastLogin
FROM LogInSeq as LISeq
left join
MyTbl OE -- other events before 2nd login
on OE.User_Id=LISeq.USer_Id
GROUP BY
LISeq.User_id
, LISeq.SecondLoginTime
, LISeq.LastLoginTime
ORDER BY LISeq.User_id;
Admittedly I wasn't aware of advanced capabilities in BigQuery, such as the ones demonstrated by @JayTiger's answer, so you can simplify this by using COUNTIF and NTH_VALUE functions (I don't have access to them).不可否认,我不知道 BigQuery 中的高级功能,例如@JayTiger 的回答所展示的功能,因此您可以使用 COUNTIF 和 NTH_VALUE 函数(我无权访问它们)来简化它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.