简体   繁体   English

SELECT 各组按列的最大值

[英]SELECT max value of each group by column

I have read some StackOverflow which similar to my questions but I can't find the exact same of my problem.我已经阅读了一些类似于我的问题的 StackOverflow,但我找不到与我的问题完全相同的问题。 I have read: select max value of each group including other column and Select max value of each group and Select max value of each group我已阅读: select 每组的最大值,包括其他列Select 每组的最大值Select 每组的最大值

So here is my problem.所以这是我的问题。

I have a table which looks like我有一张看起来像的桌子

+---------+---------------+-----------------------+
|column_1 |   column_2    |      column_3         | 
+---------+---------------+-----------------------+
|    A    |      200      | 1618558797853684118   |     
|    A    |      198.7    | 1618558797854783205   | 
|    A    |      201.3    | 1618558797855282263   |    
|    B    |      350.5    | 1618558775580928115   |  
|    B    |      349.9    | 1618558775581128138   |  
|    B    |      350.1    | 1618558775580856107   |
|    C    |      532      | 1618558797852667035   |
|    C    |      531      | 1618558775580345051   |
|    A    |      300      | 1618558797855492289   |
|    A    |      302      | 1618558797852512023   |   
|   ...   |  ........     |        ...            | 
+---------+---------------+-----------------------+

So as you can see the three of each row given each alphabet on column_1 almost have the same value, right?因此,您可以看到column_1上给定每个字母表的每一行中的三个几乎具有相同的值,对吧? I need to get one of each of them, but only in the sequences.我需要得到它们中的每一个,但只能在序列中。 Let's take a look at the desired output for more clarity:为了更清楚,让我们看一下所需的 output:

Desired output
+---------+---------------------------------------------------------------+-------------------------+
|column_1 |                         column_2                              |      column_3           | 
+---------+---------------------------------------------------------------+-------------------------+
|    A    | it can be (200 or 198.7 or 201.3) does not matter which one   | (depends on column_2)   |     
|    B    | it can be (350.5 or 349.9 or 350.1) does not matter which one | (depends on column_2)   | 
|    C    | it can be (532 or 531) does not matter which one              | (depends on column_2)   |    
|    A    | it can be (300 or 302) does not matter which one              | (depends on column_2)   |     
|   ...   |                        ........                               |          ...            | 
+---------+---------------------------------------------------------------+-----------------------+

So what I'm thinking is to group by each column and take the max or min value of column_3 (does not matter which one), but I failed to do that.所以我在想的是按每一列分组并取column_3的最大值或最小值(不管是哪一个),但我没有这样做。

I'm sorry for the complex question, but can you help me?我很抱歉这个复杂的问题,但你能帮我吗? Thanks谢谢

Consider below考虑下面

#standardSQL
with `project.dataset.table` as (
  select 1 id, 'A' column_1, 200 column_2, 1618558797853684118 column_3 union all
  select 2, 'A', 198.7, 1618558797854783205 union all
  select 3, 'A', 201.3, 1618558797855282263 union all
  select 4, 'B', 350.5, 1618558775580928115 union all
  select 5, 'B', 349.9, 1618558775581128138 union all
  select 6, 'B', 350.1, 1618558775580856107 union all
  select 7, 'C', 532, 1618558797852667035 union all
  select 8, 'C', 531, 1618558775580345051 union all
  select 9, 'A', 300, 1618558797855492289 union all
  select 10, 'A', 302, 1618558797852512023 union all
  select 12, 'C', 709, 1618558797852562325 union all
  select 13, 'C', 803, 1618558797851315651
)
select as value array_agg(struct(column_1, column_2, column_3) order by column_2 limit 1)[offset(0)]
from (
  select *, countif(flag) over(order by id) grp
  from (
    select *, column_1 != lag(column_1) over(order by id) flag
    from `project.dataset.table`
  )
) 
group by column_1, grp   

with output与 output

在此处输入图像描述

You seem to have a form of gaps-and-islands problem.你似乎有一种形式的差距和岛屿问题。 You want one row when adjacent rows have the same value for column_1 .当相邻行的column_1值相同时,您需要一行。

I would suggest lag() (for the first row in each group) or lead() (for the last row):我建议lag() (用于每组的第一行)或lead() (用于最后一行):

select t.*
from (select t.*,
             lag(column_1) over (order by column_3) as prev_column_1
      from t
     ) t
where prev_column_1 is null or prev_column_1 <> column_1;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM