簡體   English   中英

BigQuery 缺少 SUM OVER PARTITION BY 的行

[英]BigQuery missing rows with SUM OVER PARTITION BY

特爾;博士:

鑒於此表:

WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
  UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
  UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)

如何獲取包含缺失日期/產品組合 ( 2020-11-02 - premium ) 的表格,其中diff的后備值為0

理想情況下,適用於多種產品。 可以像這樣獲得所有產品的列表:

SELECT ARRAY_AGG(DISTINCT product) FROM subscriptions

我希望能夠獲得每天的訂閱計數,無論是針對所有產品還是僅針對某些產品。

我認為這很容易實現的方法是准備一個如下所示的數據庫:

|---------------------|------------------|------------------|
|         date        |      product     |       total      |
|---------------------|------------------|------------------|
|      2020-11-01     |      premium     |        100       |
|---------------------|------------------|------------------|
|      2020-11-01     |       basic      |        50        |
|---------------------|------------------|------------------|

使用此表,我可以輕松地按日期和產品或僅按日期分組並匯總總數。

在我得到結果表之前,我已經生成了一個表,其中我計算了每天和產品的訂閱差異。 每個產品有多少新訂閱者,有多少不再訂閱。

該表如下所示:

|---------------------|------------------|------------------|
|         date        |      product     |       diff       |
|---------------------|------------------|------------------|
|      2020-11-01     |      premium     |        50        |
|---------------------|------------------|------------------|
|      2020-11-01     |       basic      |       -20        |
|---------------------|------------------|------------------|

也就是說,11月1日,高級用戶總數增加了50個,基本用戶總數減少了20個。

現在的問題是,如果一個產品沒有任何更改,則此臨時表缺少日期點,請參見下面的示例。


當我開始時沒有產品表,我只有日期和差異列。

為了從第二個表到第一個表,我使用了這個完美的查詢:

WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, 150 as diff
  UNION ALL SELECT TIMESTAMP("2020-11-02"), -10
  UNION ALL SELECT TIMESTAMP("2020-11-03"), 60
)
SELECT 
  *,
  SUM(diff) OVER (ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date

但是當我添加產品列並嘗試計算每天和產品的總和時,缺少一些數據點。

WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
  UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
  UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
SELECT 
  *,
  SUM(diff) OVER (PARTITION BY product ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date

——

|---------------------|------------------|------------------|
|         date        |      product     |      total       |
|---------------------|------------------|------------------|
|      2020-11-01     |       basic      |       100        |
|---------------------|------------------|------------------|
|      2020-11-01     |      premium     |        50        |
|---------------------|------------------|------------------|
|      2020-11-02     |       basic      |        90        |
|---------------------|------------------|------------------|
|      2020-11-03     |       basic      |       130        |
|---------------------|------------------|------------------|
|      2020-11-03     |      premium     |        70        |
|---------------------|------------------|------------------|

如果我現在顯示每天的訂閱總數,我會得到:

150 -> 90 -> 200

但我希望:

150 -> 140 -> 200

每天的高級訂閱總數也是如此:

50 -> 0 -> 70

但我希望:

50 -> 50 -> 70


我相信解決此問題的最佳選擇是添加缺少的日期/產品組合。

我該怎么做?

使用GENERATE_TIMESTAMP_ARRAY

WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
  UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
  UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
),
dates AS (
  SELECT * 
  FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2020-11-01 00:00:00', '2020-11-03 00:00:00', INTERVAL 1 DAY)) as date
),
products AS (
  SELECT DISTINCT product FROM subscriptions
)
SELECT dates.date, products.product, subscriptions.diff
FROM dates 
CROSS JOIN products
LEFT JOIN subscriptions 
ON subscriptions.date = dates.date AND subscriptions.product = products.product

如果我正確地跟隨您,一種方法是可以生成您想要的期間的固定日期列表,並將其與產品列表cross join 這為您提供了所有可能的組合。 然后,你可以帶一個left join的訂閱表,最后執行窗口求和:

select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from unnest(generate_timestamp_array(
    timestamp('2020-11-01'), 
    timestamp('2020-11-03'), 
    interval 1 day)
) dt
cross join (
    select 'basic' product 
    union all select 'premium'
) p
left join subscriptions on s.product = p.product and s.date = dt

我們可以通過動態生成日期范圍和產品列表來使查詢更通用:

select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from (select min(date) min_dt, max(date) max_dt from subscriptions) d0
cross join unnest(generate_timestamp_array(d0.min_dt, d0.max_dt, interval 1 day)) dt
cross join (select distinct product from subscriptions) p
left join subscriptions on s.product = p.product and s.date = dt
      -- Try this,I am creating a table for list of products and add total product in that list. Joining with your table to get data as per your requirement.
      WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
        UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
        UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
        UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
        UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
      ),

      product_name as (
      Select product from subscriptions group by 1
      union all
      Select "Total" as product
      )

      Select date
            ,product
            ,total_subscriptions
      from (      
      Select a.date
            ,a.product
            ,diff
            ,SUM(diff) OVER (PARTITION BY a.product ORDER BY a.date) as total_subscriptions
      from 
      (
      Select date,a.product
      from product_name A
       join subscriptions B
       on 1=1
       where a.product !='Total'
      group by 1,2
      ) A
      left join subscriptions B 
      on A.product = B.product
      and A.date = B.date
      group by 1,2,3
      ) group by 1,2,3
      union all
      Select date
            ,product
            ,total_subscriptions
      from 
      (
      Select date,a.product
            ,diff
            ,SUM(diff) OVER (PARTITION BY a.product ORDER BY date) as total_subscriptions
      from product_name A
       join subscriptions B
       on 1=1
       where a.product ='Total'
      group by 1,2,3
      ) group by 1,2,3
      order by 1,2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM