簡體   English   中英

SQL/BigQuery:如何避免對一個組的多個非連續成員進行分組?

[英]SQL/BigQuery: how to avoid grouping multiple, non-consecutive members of a group?

我遇到了一個我自己似乎無法解決的問題。 我按位置和時間戳對行進行分組,並為實體保持靜止的實例查找第一個和最后一個時間戳。 問題是對於我當前的代碼,當實體返回到之前的位置時,SQL 將行組合在一起。

在我的示例中,實體在 2020-05-24 05:22:00 位於位置 -66.89 10.5002,然后在 2020-05-24 11:13:00 返回到該位置。 當前查詢的結果使該實體看起來一直在該位置,盡管中間的行清楚地表明它已移動。 這是一個概念問題,我真的不知道如何在 SQL 中解決。 我在 Big Query 中執行此操作,但我記得在 SQL 服務器中遇到了類似的問題。

代碼:

with selection as (
select 1 as id,TIMESTAMP '2020-05-24 11:13:00' as timestamp_, 'POINT(-66.89 10.5002)' as geom
union all select
1,TIMESTAMP '2020-05-24 05:22:00','POINT(-66.89 10.5002)'
union all select
1,TIMESTAMP '2020-05-24 05:25:00','POINT(-66.8881 10.4994)'
union all select
1,TIMESTAMP '2020-05-24 09:14:00','POINT(-66.8888 10.4958)'
union all select
1,TIMESTAMP '2020-05-24 07:37:00 UTC','POINT(-66.8873 10.5)'
union all select
1, TIMESTAMP'2020-05-24 07:52:00 UTC','POINT(-66.8873 10.5)'
)

select id,timestamp_,geom,
first_value(timestamp_)
    OVER (PARTITION BY id,geom ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_start,
last_value(timestamp_)
    OVER (PARTITION BY id,geom ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_end,
FROM
selection order by id,timestamp_

結果。 注意第一行和最后一行的 interval_start 和 interval_end

ID 時間戳_ 幾何 間隔開始 間隔結束
1 2020-05-24 05:22:00 UTC 點(-66.89 10.5002) 2020-05-24 05:22:00 UTC 2020-05-24 11:13:00 UTC
1 2020-05-24 05:25:00 UTC 點(-66.8881 10.4994) 2020-05-24 05:25:00 UTC 2020-05-24 05:25:00 UTC
1 2020-05-24 07:37:00 UTC 點(-66.8873 10.5) 2020-05-24 07:37:00 UTC 2020-05-24 07:52:00 UTC
1 2020-05-24 07:52:00 UTC 點(-66.8873 10.5) 2020-05-24 07:37:00 UTC 2020-05-24 07:52:00 UTC
1 2020-05-24 09:14:00 UTC 點(-66.8888 10.4958) 2020-05-24 09:14:00 UTC 2020-05-24 09:14:00 UTC
1 2020-05-24 11:13:00 UTC 點(-66.89 10.5002) 2020-05-24 05:22:00 UTC 2020-05-24 11:13:00 UTC

期望的結果:

ID 時間戳_ 幾何 間隔開始 間隔結束
1 2020-05-24 05:22:00 UTC 點(-66.89 10.5002) 2020-05-24 05:22:00 UTC 2020-05-24 05:22:00 UTC
1 2020-05-24 05:25:00 UTC 點(-66.8881 10.4994) 2020-05-24 05:25:00 UTC 2020-05-24 05:25:00 UTC
1 2020-05-24 07:37:00 UTC 點(-66.8873 10.5) 2020-05-24 07:37:00 UTC 2020-05-24 07:52:00 UTC
1 2020-05-24 07:52:00 UTC 點(-66.8873 10.5) 2020-05-24 07:37:00 UTC 2020-05-24 07:52:00 UTC
1 2020-05-24 09:14:00 UTC 點(-66.8888 10.4958) 2020-05-24 09:14:00 UTC 2020-05-24 09:14:00 UTC
1 2020-05-24 11:13:00 UTC 點(-66.89 10.5002) 2020-05-24 11:13:00 UTC 2020-05-24 11:13:00 UTC

考慮下面

with selection as (
  select 1 as id,TIMESTAMP '2020-05-24 11:13:00' as timestamp_, 'POINT(-66.89 10.5002)' as geom union all select
  1,TIMESTAMP '2020-05-24 05:22:00','POINT(-66.89 10.5002)' union all select
  1,TIMESTAMP '2020-05-24 05:25:00','POINT(-66.8881 10.4994)' union all select
  1,TIMESTAMP '2020-05-24 09:14:00','POINT(-66.8888 10.4958)' union all select
  1,TIMESTAMP '2020-05-24 07:37:00 UTC','POINT(-66.8873 10.5)' union all select
  1, TIMESTAMP'2020-05-24 07:52:00 UTC','POINT(-66.8873 10.5)'
), pregrouped_selection as (
  select id, timestamp_, geom, 
    countif(flag) over(partition by id order by timestamp_) grp
  from (
    select id, timestamp_, geom,
      geom != ifnull(lag(geom) over(partition by id order by timestamp_), geom) flag,
    from selection 
  )
  order by id, timestamp_
)
select id,timestamp_,geom,
first_value(timestamp_)
    OVER (PARTITION BY id,grp ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_start,
last_value(timestamp_)
    OVER (PARTITION BY id,grp ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_end,
FROM
pregrouped_selection order by id,timestamp_    

與 output

在此處輸入圖像描述

正如您所看到的 - 我幾乎 100% 保留了原始查詢 - 只是將geom替換為over()語句中的grp並從pregrouped_selection計算組號 - grp

您可以使用 window 函數檢查是否至少有兩個不同的值:

min(geom) over (partition by id) <> max(geom) over (partition by id) as has_moved,

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM