簡體   English   中英

當前行優先和精確匹配之間的行

[英]ROWS BETWEEN CURRENT ROW PRECEDING AND EXACT MATCH FOLLOWING

考慮以下虛擬數據集

ID EVENT VALUE SORT_KEY
1 submitted 10 1
1 action 20 2
1 closed 30 3
1 action 30 4
2 submitted 10 1 
2 action 10 2
2 action 10 3
2 closed 10 4
2 action 10 5
3 action 29 1
3 submitted 20 2
3 action 10 3
3 closed 10 4
3 action 10 5
4 action 10 1

我想對提交(包括)和關閉之間的每個id的所有操作求和。 我不在乎這些邊界之外的事件。 我想知道是否有可能通過id來構建窗口函數分區,然后再進行匹配直到匹配表達式。

所需結果:

ID EVENT VALUE_SUM
1 submitted 60
2 submitted 40
3 submitted 40

計算該查詢的查詢如下所示:

SELECT 
    id
  , event
  , SUM(value) OVER (PARTITION BY id ROWS BETWEEN CURRENT ROW PRECEDING AND event='closed' FOLLOWING) as value_sum
FROM my_table
WHERE event = 'submitted'

我知道可以通過與自身的多個聯接來執行此操作,但是由於數據的大小以及優化原因,我想知道是否可以使用窗口函數來執行此操作。 謝謝。

以下是適用於BigQuery SQL的無聯接但仍只是快速草圖,因此可能仍是重構/優化的選項

#standardSQL
SELECT id, SUM(VALUE) AS val
FROM (
  SELECT id, EVENT, VALUE, 
    SUM(boundary) OVER(PARTITION BY ID ORDER BY SORT_KEY ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) grp
  FROM (
    SELECT *, 
      COUNTIF(EVENT IN ('submitted', 'closed')) OVER(PARTITION BY ID, EVENT ORDER BY SORT_KEY) boundary
    FROM `project.dataset.table` t
  )
)
WHERE grp = 1
OR (grp = 2 AND EVENT = 'closed')
GROUP BY ID

您可以使用虛擬數據測試/玩游戲,如您的問題所示:

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 ID, 'submitted' EVENT, 10 VALUE, 1 SORT_KEY UNION ALL
  SELECT 1, 'action', 20, 2 UNION ALL
  SELECT 1, 'closed', 30, 3 UNION ALL
  SELECT 1, 'action', 30, 4 UNION ALL
  SELECT 2, 'submitted', 10, 1 UNION ALL 
  SELECT 2, 'action', 10, 2 UNION ALL
  SELECT 2, 'action', 10, 3 UNION ALL
  SELECT 2, 'closed', 10, 4 UNION ALL
  SELECT 2, 'action', 10, 5 UNION ALL
  SELECT 3, 'action', 29, 1 UNION ALL
  SELECT 3, 'submitted', 20, 2 UNION ALL
  SELECT 3, 'action', 10, 3 UNION ALL
  SELECT 3, 'closed', 10, 4 UNION ALL
  SELECT 3, 'action', 10, 5 UNION ALL
  SELECT 4, 'action', 10, 1 
)
SELECT id, SUM(VALUE) AS val
FROM (
  SELECT id, EVENT, VALUE, 
    SUM(boundary) OVER(PARTITION BY ID ORDER BY SORT_KEY ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) grp
  FROM (
    SELECT *, 
      COUNTIF(EVENT IN ('submitted', 'closed')) OVER(PARTITION BY ID, EVENT ORDER BY SORT_KEY) boundary
    FROM `project.dataset.table` t
  )
)
WHERE grp = 1
OR (grp = 2 AND EVENT = 'closed')
GROUP BY ID
ORDER BY ID

結果是

Row id  val  
1   1   60   
2   2   40   
3   3   40   

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM