[英]How do I calculate the Rolling Average for the difference of days between events on BigQuery?
我有一個像這樣的事件表:
date event_category event_planner
2019-09-22T00:00:00 soccer_night Marcus
2019-09-25T00:00:00 comedy_night John
2019-09-28T00:00:00 dance_party John
2019-10-02T00:00:00 soccer_night Marcus
這里的想法是獲得每個計划者日期之間差異的滾動平均值。 到目前為止,我有每個計划者按以下類別分隔的天數距離: DATE_DIFF(SAFE_CAST(date AS date),LAG(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner ORDER BY date), day) AS result
我期望的是這樣的:
date event_category event_planner rolling_avg
2019-09-22T00:00:00 soccer_night Marcus 0
2019-09-25T00:00:00 comedy_night John 0
2019-09-28T00:00:00 comedy_night John 3
2019-10-02T00:00:00 soccer_night Marcus 10
2019-10-10T00:00:00 comedy_night John 7
以下是 BigQuery 標准 SQL
#standardSQL
SELECT * EXCEPT(day, diff), IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day), 0) rolling_avg
FROM (
SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
FROM (
SELECT *, SAFE_CAST(date AS DATE) AS day
FROM `project.dataset.table`
)
)
如果適用於您問題中的樣本數據
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2019-09-22T00:00:00' date, 'soccer_night' event_category, 'Marcus' event_planner UNION ALL
SELECT '2019-09-25T00:00:00', 'comedy_night', 'John' UNION ALL
SELECT '2019-09-28T00:00:00', 'comedy_night', 'John' UNION ALL
SELECT '2019-10-02T00:00:00', 'soccer_night', 'Marcus' UNION ALL
SELECT '2019-10-10T00:00:00', 'comedy_night', 'John'
)
結果是
Row date event_category event_planner rolling_avg
1 2019-09-22 00:00:00 UTC soccer_night Marcus 0
2 2019-09-25 00:00:00 UTC comedy_night John 0
3 2019-09-28 00:00:00 UTC comedy_night John 3.0
4 2019-10-02 00:00:00 UTC soccer_night Marcus 10.0
5 2019-10-10 00:00:00 UTC comedy_night John 7.5
我應該如何修改以使用同一計划者最近三個相同類型事件的平均值?
#standardSQL
SELECT * EXCEPT(day, diff),
IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 0) rolling_avg
FROM (
SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
FROM (
SELECT *, SAFE_CAST(date AS DATE) AS day
FROM `project.dataset.table`
)
)
您可以使用lag()
計算子查詢中的最后一個日期,然后在外部查詢中進行滾動平均:
select
t.*,
avg(date_diff(date, lag_date, day)) over(
partition by event_category, event_planner order by date
) rolling_avg
from (
select
t.*
lag(date) over(
partition by event_category, event_planner order by date
) lag_date
from mytable t
) t
對於平均值,您可以使用:
(DATE_DIFF(MIN(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner),
SAFE_CAST(date AS date),
day
) /
NULLIF(COUNT(*) OVER (PARTITION BY event_category, event_planner) - 1, 0)
) AS result
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.