簡體   English   中英

如何計算 BigQuery 上事件之間天數差異的滾動平均值?

[英]How do I calculate the Rolling Average for the difference of days between events on BigQuery?

我有一個像這樣的事件表:

date                  event_category     event_planner

2019-09-22T00:00:00   soccer_night       Marcus
2019-09-25T00:00:00   comedy_night       John
2019-09-28T00:00:00   dance_party        John
2019-10-02T00:00:00   soccer_night       Marcus

這里的想法是獲得每個計划者日期之間差異的滾動平均值。 到目前為止,我有每個計划者按以下類別分隔的天數距離: DATE_DIFF(SAFE_CAST(date AS date),LAG(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner ORDER BY date), day) AS result

我期望的是這樣的:

date                  event_category     event_planner     rolling_avg

2019-09-22T00:00:00   soccer_night       Marcus            0
2019-09-25T00:00:00   comedy_night       John              0
2019-09-28T00:00:00   comedy_night       John              3
2019-10-02T00:00:00   soccer_night       Marcus            10
2019-10-10T00:00:00   comedy_night       John              7

以下是 BigQuery 標准 SQL

#standardSQL
SELECT * EXCEPT(day, diff), IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day), 0) rolling_avg
FROM (
  SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
  FROM (
    SELECT *, SAFE_CAST(date AS DATE) AS day 
    FROM `project.dataset.table`
  )
)

如果適用於您問題中的樣本數據

WITH `project.dataset.table` AS (
  SELECT TIMESTAMP '2019-09-22T00:00:00' date, 'soccer_night' event_category, 'Marcus' event_planner UNION ALL
  SELECT '2019-09-25T00:00:00', 'comedy_night', 'John' UNION ALL
  SELECT '2019-09-28T00:00:00', 'comedy_night', 'John' UNION ALL
  SELECT '2019-10-02T00:00:00', 'soccer_night', 'Marcus' UNION ALL
  SELECT '2019-10-10T00:00:00', 'comedy_night', 'John' 
)

結果是

Row date                    event_category  event_planner   rolling_avg  
1   2019-09-22 00:00:00 UTC soccer_night    Marcus          0    
2   2019-09-25 00:00:00 UTC comedy_night    John            0    
3   2019-09-28 00:00:00 UTC comedy_night    John            3.0  
4   2019-10-02 00:00:00 UTC soccer_night    Marcus          10.0     
5   2019-10-10 00:00:00 UTC comedy_night    John            7.5    

我應該如何修改以使用同一計划者最近三個相同類型事件的平均值?

#standardSQL
SELECT * EXCEPT(day, diff), 
  IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 0) rolling_avg
FROM (
  SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
  FROM (
    SELECT *, SAFE_CAST(date AS DATE) AS day 
    FROM `project.dataset.table`
  )
)

您可以使用lag()計算子查詢中的最后一個日期,然后在外部查詢中進行滾動平均:

select
    t.*,
    avg(date_diff(date, lag_date, day)) over(
        partition by event_category, event_planner order by date
    ) rolling_avg
from (
    select
        t.*
        lag(date) over(
            partition by event_category, event_planner order by date
        ) lag_date
    from mytable t
) t

對於平均值,您可以使用:

(DATE_DIFF(MIN(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner),
           SAFE_CAST(date AS date),
           day
          ) / 
 NULLIF(COUNT(*) OVER (PARTITION BY event_category, event_planner) - 1, 0)
) AS result

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM