简体   繁体   中英

SQL: aggregate over aggregate (max over sums)

I have problem creating valid query to aggregate over aggregate subquery. MySQL allows some non-ANSI constructs but they give incorrect results.

CREATE TABLE `log` (
  `id` int NOT NULL,
  `id_user` varchar(32) NOT NULL,
  `datastamp` datetime NOT NULL DEFAULT now(),
  `processed` int NOT NULL DEFAULT '0',
   PRIMARY KEY (`id`));

I want to have result table consisting of "best" user for every year (where "best" means having highest total sum over processed field), like:

source table:

2010 | u1 | 1
2010 | u1 | 3
2010 | u2 | 2
2011 | u1 | 1
2011 | u1 | 1
2011 | u2 | 5

result:

2010 | u1 | 4
2011 | u2 | 5

simple query

select year(datastamp) as y, id_user, sum(processed) as ps from log group by id_user, y

gives all sums per user and year:

2010 | u1 | 4
2010 | u2 | 2
2011 | u1 | 2
2011 | u2 | 5

but I can't select rows with highest sum for every year. Trying something like

select y, max(ps), id_user from(...) group by y

although accepted by MySQL gives incorrect id_user field. Other solutions I found on stackoverflow suggest joining base table with subquery but I cannot use aggregate results (sum(processed) as ps) inside ON condition.

I think windowing functions might help you in this case. You can query the data using below query -

select *
from
(

select year, id_user, ps, rank() over (partition by year order by ps desc) as ranks_per_year
from
(
select year, id_user, sum(processed) as ps
from table
group by 1,2
) A 

) B
where ranks_per_year = 1

rank() and dense_rank() are 2 methods you can use in case of tie.

在此处输入图像描述

In case the rank() does not work in your engine like you were mentioning, you can go ahead with max() function. Here is the query

with tbl as 
(
select '2010' as year,'u1' as id_user,1 as processed union all
select '2010','u1',3 union all
select '2010','u2',2 union all
select '2011','u1',1 union all
select '2011','u1',1 union all
select '2011','u2',5 
)

select *
from
(

select year, id_user, ps, 
max(ps) over (partition by year) as max_ps_per_year 
from
(
select year, id_user, sum(processed) as ps
from tbl
group by 1,2
) A 

) B
where ps = max_ps_per_year

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM