简体   繁体   中英

aggregation function for multiple group by

I'm trying to get the city with largest number of order each day. I'm a little confused of how the aggregation functions work when there are multiple group by columns.

Suppose there is a table Trips with columns:
* order id (unique)
* city
* date

The data is like a question in leetcode sample

I want to find the city with the largest number of order each day.

select 
    date, 
    city, 
    count(*) as city_cnt
from trips a
group by date, city
having count(*) = (select max(count(*)) 
                   from trips b 
                   where b.date = a.date 
                   group by b.city)

This code can get the expected result but I think there might be a better solution.

Problem 1 : Any other way to get the result?

Problem 2 : At first, I tried to use max(count(*)) at the first "SELECT" clause without "HAVING" clause. There is an error "not a single-group group function":

select 
    date, 
    city, 
    max(count(*)) as max_city_cnt
from trips a
group by date, city

I thought count(*) can calculate daily order number of each city, with max() function I could get the largest order number of each day. But it doesn't work. Could anyone explain it?

Problem 3 : I'm also not that clear with the relationship between "partition by" in window function and "group by";

select
    date,
    city,
    count(id) city_cnt,
    rank() over (partition by date order by count(id) desc) d_rank
from trips
group by date, city
;

For the code in the window function :

rank() over (partition by date order by count(id) desc) d_rank
  1. Is count(id) calculated under the "GROUP BY" condidtion?
  2. Does rank() only depend on the numbers in each day? --> To get the rank of daily order number of each city in a given day.

Thanks in advance!

Your version using RANK is probably the least verbose and also most performant. But, you need a subquery to restrict each city/date group to the record with the highest count:

WITH cte AS (
    SELECT date, city, COUNT(id) city_cnt,
        RANK() OVER (PARTITION BY date ORDER BY COUNT(id) DESC) d_rank
    FROM trips
    GROUP BY date, city
)

SELECT date, city
FROM cte
WHERE d_rank = 1;

The above CTE assigns a rank series to each group of cities in the same day. Then, we restrict to only the cities having the highest count for each day. Note that RANK (and DENSE_RANK ) allow for ties, so should there be more than one city tied for first place with the highest count on a certain day, the above query would pick up on it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM