[英]SELECT max value of each group by column
I have read some StackOverflow which similar to my questions but I can't find the exact same of my problem.我已经阅读了一些类似于我的问题的 StackOverflow,但我找不到与我的问题完全相同的问题。 I have read: select max value of each group including other column and Select max value of each group and Select max value of each group
我已阅读: select 每组的最大值,包括其他列和Select 每组的最大值和Select 每组的最大值
So here is my problem.所以这是我的问题。
I have a table which looks like我有一张看起来像的桌子
+---------+---------------+-----------------------+
|column_1 | column_2 | column_3 |
+---------+---------------+-----------------------+
| A | 200 | 1618558797853684118 |
| A | 198.7 | 1618558797854783205 |
| A | 201.3 | 1618558797855282263 |
| B | 350.5 | 1618558775580928115 |
| B | 349.9 | 1618558775581128138 |
| B | 350.1 | 1618558775580856107 |
| C | 532 | 1618558797852667035 |
| C | 531 | 1618558775580345051 |
| A | 300 | 1618558797855492289 |
| A | 302 | 1618558797852512023 |
| ... | ........ | ... |
+---------+---------------+-----------------------+
So as you can see the three of each row given each alphabet on column_1
almost have the same value, right?因此,您可以看到
column_1
上给定每个字母表的每一行中的三个几乎具有相同的值,对吧? I need to get one of each of them, but only in the sequences.我需要得到它们中的每一个,但只能在序列中。 Let's take a look at the desired output for more clarity:
为了更清楚,让我们看一下所需的 output:
Desired output
+---------+---------------------------------------------------------------+-------------------------+
|column_1 | column_2 | column_3 |
+---------+---------------------------------------------------------------+-------------------------+
| A | it can be (200 or 198.7 or 201.3) does not matter which one | (depends on column_2) |
| B | it can be (350.5 or 349.9 or 350.1) does not matter which one | (depends on column_2) |
| C | it can be (532 or 531) does not matter which one | (depends on column_2) |
| A | it can be (300 or 302) does not matter which one | (depends on column_2) |
| ... | ........ | ... |
+---------+---------------------------------------------------------------+-----------------------+
So what I'm thinking is to group by each column and take the max or min value of column_3
(does not matter which one), but I failed to do that.所以我在想的是按每一列分组并取
column_3
的最大值或最小值(不管是哪一个),但我没有这样做。
I'm sorry for the complex question, but can you help me?我很抱歉这个复杂的问题,但你能帮我吗? Thanks
谢谢
Consider below考虑下面
#standardSQL
with `project.dataset.table` as (
select 1 id, 'A' column_1, 200 column_2, 1618558797853684118 column_3 union all
select 2, 'A', 198.7, 1618558797854783205 union all
select 3, 'A', 201.3, 1618558797855282263 union all
select 4, 'B', 350.5, 1618558775580928115 union all
select 5, 'B', 349.9, 1618558775581128138 union all
select 6, 'B', 350.1, 1618558775580856107 union all
select 7, 'C', 532, 1618558797852667035 union all
select 8, 'C', 531, 1618558775580345051 union all
select 9, 'A', 300, 1618558797855492289 union all
select 10, 'A', 302, 1618558797852512023 union all
select 12, 'C', 709, 1618558797852562325 union all
select 13, 'C', 803, 1618558797851315651
)
select as value array_agg(struct(column_1, column_2, column_3) order by column_2 limit 1)[offset(0)]
from (
select *, countif(flag) over(order by id) grp
from (
select *, column_1 != lag(column_1) over(order by id) flag
from `project.dataset.table`
)
)
group by column_1, grp
with output与 output
You seem to have a form of gaps-and-islands problem.你似乎有一种形式的差距和岛屿问题。 You want one row when adjacent rows have the same value for
column_1
.当相邻行的
column_1
值相同时,您需要一行。
I would suggest lag()
(for the first row in each group) or lead()
(for the last row):我建议
lag()
(用于每组的第一行)或lead()
(用于最后一行):
select t.*
from (select t.*,
lag(column_1) over (order by column_3) as prev_column_1
from t
) t
where prev_column_1 is null or prev_column_1 <> column_1;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.