[英]Time difference between 2 distinct events in BigQuery
我正在尝试计算 BigQuery 中 2 个事件之间的时间差(它们是我们在 Firebase 中设置的 2 个自定义事件)。 第一个是 event_a,第二个是在 event_a 之后触发的 event_b(无论何时)。
我尝试过以下查询:
SELECT round(AVG(time_diff),2) avg_duration_minutes
FROM(
SELECT user_pseudo_id,
CASE WHEN event_name = 'event_a' AND
LEAD(event_name,1) OVER(PARTITION BY user_id ORDER BY event_timestamp ASC) = 'event_b'
THEN TIMESTAMP_DIFF(TIMESTAMP_MICROS(LEAD(event_timestamp, 1) OVER(PARTITION BY user_id ORDER BY event_timestamp ASC)), TIMESTAMP_MICROS(event_timestamp), minute) END time_diff
FROM `database`
WHERE event_name in ('event_a', 'event_b')
)
where time_diff > 0.2
样本数据:
user_pseudo_id event timestamp
aaa event_a 1587995938387000
bbb event_a 1590948191239003
aaa event_b 1587995943075005
ccc event_a 1589130017650008
aaa event_a 1593078261900005
aaa event_b 1593078881226002
bbb event_b 1590948208425007
ccc event_b 1589130462706020
我想得到的结果是每个用户的 event_a 和 event_b 之间的平均时间和总时间。
你有什么建议吗? 重要的是要知道两个特定事件之间发生了多少时间(无论第二个事件何时发生)。
以下是 BigQuery 标准 SQL
#standardSQL
SELECT
user_pseudo_id,
AVG(duration) AS avg_duration,
SUM(duration) AS total_duration
FROM (
SELECT *, LEAD(timestamp) OVER(win) - timestamp AS duration
FROM `project.dataset.table`
WHERE event IN ('event_a', 'event_b')
WINDOW win AS (PARTITION BY user_pseudo_id ORDER BY timestamp)
)
WHERE event = 'event_a'
GROUP BY user_pseudo_id
我会这样回答:
with data as (
select user_pseudo_id, event_name, event_timestamp from `database` where event_name in ('event_a', 'event_b')
),
ea as (
-- Get first event_a per user
select user_pseudo_id, min(event_timestamp) as first_a_ts from data where event_name = 'event_a' group by 1
),
eb as (
-- Get first event_b per user
select user_pseudo_id, min(event_timestamp) as first_b_ts from data where event_name = 'event_b' group by 1
),
joined (
-- Assume we only want to calculate duration if user has an event_b, hence inner join
select *
from ea
inner join eb using(user_pseudo_id)
where first_b_ts > first_a_ts
)
select
avg(timestamp_diff(first_b_ts, first_a_ts, second))/60.0 as avg_duration_minutes
from joined
我没有包括您的.2
,因为我不确定您为什么任意过滤掉小于 12 秒的差异。
如果要在事件 a 之后获取事件 b 的时间,可以使用条件累积最小值:
SELECT ab.*
FROM (SELECT user_pseudo_id, event_timestamp as event_a_timestamp,
MIN(CASE WHEN event_name = 'event_b' THEN event_timestamp END) OVER
(PARTITION BY user_id
ORDER BY event_timestamp
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) as event_b_timestamp
FROM `database`
WHERE event_name in ('event_a', 'event_b')
) ab
WHERE event_name = 'event_a'
您的问题没有提供足够的详细信息来确定还需要做什么。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.