SQL/BigQuery：如何避免對一個組的多個非連續成員進行分組？

Question

我遇到了一個我自己似乎無法解決的問題。 我按位置和時間戳對行進行分組，並為實體保持靜止的實例查找第一個和最后一個時間戳。 問題是對於我當前的代碼，當實體返回到之前的位置時，SQL 將行組合在一起。

在我的示例中，實體在 2020-05-24 05:22:00 位於位置 -66.89 10.5002，然后在 2020-05-24 11:13:00 返回到該位置。 當前查詢的結果使該實體看起來一直在該位置，盡管中間的行清楚地表明它已移動。 這是一個概念問題，我真的不知道如何在 SQL 中解決。 我在 Big Query 中執行此操作，但我記得在 SQL 服務器中遇到了類似的問題。

代碼：

with selection as (
select 1 as id,TIMESTAMP '2020-05-24 11:13:00' as timestamp_, 'POINT(-66.89 10.5002)' as geom
union all select
1,TIMESTAMP '2020-05-24 05:22:00','POINT(-66.89 10.5002)'
union all select
1,TIMESTAMP '2020-05-24 05:25:00','POINT(-66.8881 10.4994)'
union all select
1,TIMESTAMP '2020-05-24 09:14:00','POINT(-66.8888 10.4958)'
union all select
1,TIMESTAMP '2020-05-24 07:37:00 UTC','POINT(-66.8873 10.5)'
union all select
1, TIMESTAMP'2020-05-24 07:52:00 UTC','POINT(-66.8873 10.5)'
)

select id,timestamp_,geom,
first_value(timestamp_)
    OVER (PARTITION BY id,geom ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_start,
last_value(timestamp_)
    OVER (PARTITION BY id,geom ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_end,
FROM
selection order by id,timestamp_

結果。 注意第一行和最后一行的 interval_start 和 interval_end

ID	時間戳_	幾何	間隔開始	間隔結束
1	2020-05-24 05:22:00 UTC	點（-66.89 10.5002）	2020-05-24 05:22:00 UTC	2020-05-24 11:13:00 UTC
1	2020-05-24 05:25:00 UTC	點（-66.8881 10.4994）	2020-05-24 05:25:00 UTC	2020-05-24 05:25:00 UTC
1	2020-05-24 07:37:00 UTC	點（-66.8873 10.5）	2020-05-24 07:37:00 UTC	2020-05-24 07:52:00 UTC
1	2020-05-24 07:52:00 UTC	點（-66.8873 10.5）	2020-05-24 07:37:00 UTC	2020-05-24 07:52:00 UTC
1	2020-05-24 09:14:00 UTC	點（-66.8888 10.4958）	2020-05-24 09:14:00 UTC	2020-05-24 09:14:00 UTC
1	2020-05-24 11:13:00 UTC	點（-66.89 10.5002）	2020-05-24 05:22:00 UTC	2020-05-24 11:13:00 UTC

期望的結果：

ID	時間戳_	幾何	間隔開始	間隔結束
1	2020-05-24 05:22:00 UTC	點（-66.89 10.5002）	2020-05-24 05:22:00 UTC	2020-05-24 05:22:00 UTC
1	2020-05-24 05:25:00 UTC	點（-66.8881 10.4994）	2020-05-24 05:25:00 UTC	2020-05-24 05:25:00 UTC
1	2020-05-24 07:37:00 UTC	點（-66.8873 10.5）	2020-05-24 07:37:00 UTC	2020-05-24 07:52:00 UTC
1	2020-05-24 07:52:00 UTC	點（-66.8873 10.5）	2020-05-24 07:37:00 UTC	2020-05-24 07:52:00 UTC
1	2020-05-24 09:14:00 UTC	點（-66.8888 10.4958）	2020-05-24 09:14:00 UTC	2020-05-24 09:14:00 UTC
1	2020-05-24 11:13:00 UTC	點（-66.89 10.5002）	2020-05-24 11:13:00 UTC	2020-05-24 11:13:00 UTC

Answer 1

考慮下面

with selection as (
  select 1 as id,TIMESTAMP '2020-05-24 11:13:00' as timestamp_, 'POINT(-66.89 10.5002)' as geom union all select
  1,TIMESTAMP '2020-05-24 05:22:00','POINT(-66.89 10.5002)' union all select
  1,TIMESTAMP '2020-05-24 05:25:00','POINT(-66.8881 10.4994)' union all select
  1,TIMESTAMP '2020-05-24 09:14:00','POINT(-66.8888 10.4958)' union all select
  1,TIMESTAMP '2020-05-24 07:37:00 UTC','POINT(-66.8873 10.5)' union all select
  1, TIMESTAMP'2020-05-24 07:52:00 UTC','POINT(-66.8873 10.5)'
), pregrouped_selection as (
  select id, timestamp_, geom, 
    countif(flag) over(partition by id order by timestamp_) grp
  from (
    select id, timestamp_, geom,
      geom != ifnull(lag(geom) over(partition by id order by timestamp_), geom) flag,
    from selection 
  )
  order by id, timestamp_
)
select id,timestamp_,geom,
first_value(timestamp_)
    OVER (PARTITION BY id,grp ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_start,
last_value(timestamp_)
    OVER (PARTITION BY id,grp ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_end,
FROM
pregrouped_selection order by id,timestamp_

與 output

正如您所看到的 - 我幾乎 100% 保留了原始查詢 - 只是將geom替換為over()語句中的grp並從pregrouped_selection計算組號 - grp

Answer 2

您可以使用 window 函數檢查是否至少有兩個不同的值：

min(geom) over (partition by id) <> max(geom) over (partition by id) as has_moved,

SQL/BigQuery：如何避免對一個組的多個非連續成員進行分組？

問題描述

2 個解決方案

解決方案1
1 已采納 2021-04-01 20:31:46

解決方案2
0 2021-04-01 20:13:32

SQL/BigQuery：如何避免對一個組的多個非連續成員進行分組？

問題描述

2 個解決方案

解決方案1 1 已采納 2021-04-01 20:31:46

解決方案2 0 2021-04-01 20:13:32

解決方案1
1 已采納 2021-04-01 20:31:46

解決方案2
0 2021-04-01 20:13:32