SELECT 各组按列的最大值

Question

我已经阅读了一些类似于我的问题的 StackOverflow，但我找不到与我的问题完全相同的问题。 我已阅读： select 每组的最大值，包括其他列和Select 每组的最大值和Select 每组的最大值

所以这是我的问题。

我有一张看起来像的桌子

+---------+---------------+-----------------------+
|column_1 |   column_2    |      column_3         | 
+---------+---------------+-----------------------+
|    A    |      200      | 1618558797853684118   |     
|    A    |      198.7    | 1618558797854783205   | 
|    A    |      201.3    | 1618558797855282263   |    
|    B    |      350.5    | 1618558775580928115   |  
|    B    |      349.9    | 1618558775581128138   |  
|    B    |      350.1    | 1618558775580856107   |
|    C    |      532      | 1618558797852667035   |
|    C    |      531      | 1618558775580345051   |
|    A    |      300      | 1618558797855492289   |
|    A    |      302      | 1618558797852512023   |   
|   ...   |  ........     |        ...            | 
+---------+---------------+-----------------------+

因此，您可以看到column_1上给定每个字母表的每一行中的三个几乎具有相同的值，对吧？ 我需要得到它们中的每一个，但只能在序列中。 为了更清楚，让我们看一下所需的 output：

Desired output
+---------+---------------------------------------------------------------+-------------------------+
|column_1 |                         column_2                              |      column_3           | 
+---------+---------------------------------------------------------------+-------------------------+
|    A    | it can be (200 or 198.7 or 201.3) does not matter which one   | (depends on column_2)   |     
|    B    | it can be (350.5 or 349.9 or 350.1) does not matter which one | (depends on column_2)   | 
|    C    | it can be (532 or 531) does not matter which one              | (depends on column_2)   |    
|    A    | it can be (300 or 302) does not matter which one              | (depends on column_2)   |     
|   ...   |                        ........                               |          ...            | 
+---------+---------------------------------------------------------------+-----------------------+

所以我在想的是按每一列分组并取column_3的最大值或最小值（不管是哪一个），但我没有这样做。

我很抱歉这个复杂的问题，但你能帮我吗？ 谢谢

Answer 1

考虑下面

#standardSQL
with `project.dataset.table` as (
  select 1 id, 'A' column_1, 200 column_2, 1618558797853684118 column_3 union all
  select 2, 'A', 198.7, 1618558797854783205 union all
  select 3, 'A', 201.3, 1618558797855282263 union all
  select 4, 'B', 350.5, 1618558775580928115 union all
  select 5, 'B', 349.9, 1618558775581128138 union all
  select 6, 'B', 350.1, 1618558775580856107 union all
  select 7, 'C', 532, 1618558797852667035 union all
  select 8, 'C', 531, 1618558775580345051 union all
  select 9, 'A', 300, 1618558797855492289 union all
  select 10, 'A', 302, 1618558797852512023 union all
  select 12, 'C', 709, 1618558797852562325 union all
  select 13, 'C', 803, 1618558797851315651
)
select as value array_agg(struct(column_1, column_2, column_3) order by column_2 limit 1)[offset(0)]
from (
  select *, countif(flag) over(order by id) grp
  from (
    select *, column_1 != lag(column_1) over(order by id) flag
    from `project.dataset.table`
  )
) 
group by column_1, grp

与 output

Answer 2

你似乎有一种形式的差距和岛屿问题。 当相邻行的column_1值相同时，您需要一行。

我建议lag() （用于每组的第一行）或lead() （用于最后一行）：

select t.*
from (select t.*,
             lag(column_1) over (order by column_3) as prev_column_1
      from t
     ) t
where prev_column_1 is null or prev_column_1 <> column_1;

SELECT 各组按列的最大值

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-04-16 20:34:26

解决方案2
0 2021-04-16 11:27:42

SELECT 各组按列的最大值

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-04-16 20:34:26

解决方案2 0 2021-04-16 11:27:42

解决方案1
1 已采纳 2021-04-16 20:34:26

解决方案2
0 2021-04-16 11:27:42