簡體   English   中英

BigQuery : GROUP BY 包含非聚合列

[英]BigQuery : GROUP BY include the non-aggregated column

假設我有一張如下表:

| sku | date     | time  | inc_col| latest_col1 |latest_col2| 
+-----+----------+-------+--------+-------------+-----------+
|1    |2020-10-26| 08:00 | 100    | 10          |a          |
|1    |2020-10-26| 10:00 | -10    | 11          |b          |
|1    |2020-10-26| 06:00 | 5      | 7           |c          | 
|2    |2020-10-26| 08:00 | 300    | 4           |x          | 
|2    |2020-10-26| 10:00 |-100    | 4           |y          |
|2    |2020-10-26| 03:00 | 10     | 8           |z          |

現在這個查詢將產生以下結果:

SELECT sku,date,SUM(inc_col) from tbl GROUP BY sku,date

;

|sku |date       | inc_col|
+----+-----------+--------+
|1   |2020-10-26 |  105   |
|2   |2020-10-26 |  210   |

是否可以包含 'latest_col1','latest_col2' ORDERED BY "time" 列的最后一個值,如下所示:

|sku |date      |inc_col|latest_col1| latest_col2|
+----+----------+-------+-----------+------------+
|1   |2020-10-26|   105 | 11        |   b        |
|2   |2020-10-26|   210 |  4        |   y        |

是否可以使用任何 WINDOWING 函數來實現這一點? 該表有數百個類型為“inc_col”和“latest_col”的列。

您可以使用窗函數 SUM 計算 inc_col 和窗函數 LAST_VALUE 來查找 latest_col1、latest_col2:

SELECT 
  DISTINCT sku,
  date,
  SUM(inc_col) OVER (PARTITION BY sku, date) AS inc_col, 
  FIRST_VALUE(latest_col1) OVER (PARTITION BY sku, date ORDER BY time DESC) AS latest_col1,
  FIRST_VALUE(latest_col2) OVER (PARTITION BY sku, date ORDER BY time desc) AS latest_col2  
FROM tbl; 

數組函數是執行此操作的一種簡單方法:

SELECT sku, date, SUM(inc_col),
       ARRAY_AGG(latest_col1 ORDER BY time)[ORDINAL(1)] as latest_col1,
       ARRAY_AGG(latest_col2 ORDER BY time)[ORDINAL(1)] as latest_col1
FROM tbl
GROUP BY sku, date;

事實上,如果您願意,您可以獲得整個最近的

SELECT sku, date, SUM(inc_col),
       (ARRAY_AGG(t ORDER BY time)[ORDINAL(1)]).* as latest_rec
FROM tbl t
GROUP BY sku, date;

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM